 Research
 Open Access
 Published:
Adaptive lifting scheme with sparse criteria for image coding
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 10 (2012)
Abstract
Lifting schemes (LS) were found to be efficient tools for image coding purposes. Since LSbased decompositions depend on the choice of the prediction/update operators, many research efforts have been devoted to the design of adaptive structures. The most commonly used approaches optimize the prediction filters by minimizing the variance of the detail coefficients. In this article, we investigate techniques for optimizing sparsity criteria by focusing on the use of an ℓ_{1} criterion instead of an ℓ_{2} one. Since the output of a prediction filter may be used as an input for the other prediction filters, we then propose to optimize such a filter by minimizing a weighted ℓ_{1} criterion related to the global ratedistortion performance. More specifically, it will be shown that the optimization of the diagonal prediction filter depends on the optimization of the other prediction filters and viceversa. Related to this fact, we propose to jointly optimize the prediction filters by using an algorithm that alternates between the optimization of the filters and the computation of the weights. Experimental results show the benefits which can be drawn from the proposed optimization of the lifting operators.
1 Introduction
The discrete wavelet transform has been recognized to be an efficient tool in many image processing fields, including denoising [1] and compression [2]. Such a success of wavelets is due to their intrinsic features: multiresolution representation, good energy compaction, and decorrelation properties [3, 4]. In this respect, the second generation of wavelets provides very efficient transforms, based on the concept of lifting scheme (LS) developed by Sweldens [5]. It was shown that interesting properties are offered by such structures. In particular, LS guarantee a lossytolossless reconstruction required in some specific applications such as remote sensing imaging for which any distortion in the decoded image may lead to an erroneous interpretation of the image [6]. Besides, they are suitable tools for scalable reconstruction, which is a key issue for telebrowsing applications [7, 8].
Generally, LS are developed for the 1D case and then they are extended in a separable way to the 2D case by cascading vertical and horizontal 1D filtering operators. It is worth noting that a separable LS may not appear always very efficient to cope with the twodimensional characteristics of edges which are neither horizontal nor vertical [9]. To this respect, several research studies have been devoted to the design of non separable lifting schemes (NSLS) in order to better capture the actual twodimensional contents of the image. Indeed, instead of using samples from the same rows (resp. columns) while processing the image along the lines (resp. columns), 2D NSLS provide smarter choices in the selection of the samples by using horizontal, vertical and oblique directions at the prediction step [9]. For example, quincunx lifting schemes were found to be suitable for coding satellite images acquired on a quincunx sampling grid [10, 11]. In [12], a 2D wavelet decomposition comprising an adaptive update lifting step and three consecutive fixed prediction lifting steps was proposed. Another structure, which is composed of three prediction lifting steps followed by an update lifting step, has also been considered in the nonadaptive case [13, 14].
In parallel with these studies, other efforts have been devoted to the design of adaptive lifting schemes. Indeed, in a coding framework, the compactness of a LSbased multiresolution representation depends on the choice of its prediction and update operators. To the best of our knowledge, most existing studies have mainly focused on the optimization of the prediction stage. In general, the goal of these studies is to introduce spatial adaptivity by varying the direction of the prediction step [15–17], the length of the prediction filters [18, 19] and the coefficient values of the corresponding filters [9, 11, 15, 20, 21]. For instance, Gerek and Çetin [16] proposed a 2D edgeadaptive lifting scheme by considering three direction angles of prediction (0°, 45°, and 135°) and by selecting the orientation which leads to the smallest gradient. Recently, Ding et al. [17] have built an adaptive directional lifting structure with perfect reconstruction: the prediction is performed in local windows in the direction of high pixel correlation. A good directional resolution is achieved by employing fractional pixel precision level. A similar approach was also adopted in [22]. In [18], three separable prediction filters with different numbers of vanishing moments are employed, and then the best prediction is chosen according to the local features. In [19], a set of linear predictors of different lengths are defined based on a nonlinear function related to an edge detector. Another alternative strategy to achieve adaptivity aims at designing lifting filters by defining a given criterion. In this context, the prediction filters are often optimized by minimizing the detail signal variance through mean square criteria [15, 20]. In [9], the prediction filter coefficients are optimized with a least mean squares (LMS) type algorithm based on the prediction error. In addition to these adaptation techniques, the minimization of the detail signal entropy has also been investigated in [11, 21]. In [11], the approach is limited to a quincunx structure and the optimization is performed in an empirical manner using the NelderMead simplex algorithm due to the fact that the entropy is an implicit function of the prediction filter. However, such heuristic algorithms present the drawback that their convergence may be achieved at a local minimum of entropy. In [21], a generalized prediction step, viewed as a mapping function, is optimized by minimizing the detail signal energy given the pixel value probability conditioned to its neighbor pixel values. The authors show that the resulting mapping function also minimizes the output entropy. By assuming that the signal probability density function (pdf) is known, the benefit of this method has firstly been demonstrated for lossless image coding in [21]. Then, an extension of this study to sparse image representation and lossy coding contexts has been presented in [23]. Consequently, an estimation of the pdf must be available at the coder and the decoder side. Note that the main drawback of this method as well as those based on directional wavelet transforms [15, 17, 22, 24, 25] is that they require to transmit losslessly a side information to the decoder which may affect the whole compression performance especially at low bitrates. Furthermore, such adaptive methods lead to an increase of the computational load required for the selection of the best direction of prediction.
It is worth pointing out that, in practical implementations of compression systems, the sparsity of a signal, where a portion of the signal samples are set to zero, has a great impact on the ultimate ratedistortion performance. For example, embedded waveletbased image coders can spend the major part of their bit budget to encode the significance map needed to locate nonzero coefficients within the wavelet domain. To this end, sparsitypromoting techniques have already been investigated in the literature. Indeed, geometric wavelet transforms such as curvelets [26] and contourlets [27] have been proposed to provide sparse representations of the images. One difficulty of such transforms is their redundancy: they usually produce a number of coefficients that is larger than the number of pixels in the original image. This can be a main obstacle for achieving efficient coding schemes. To control this redundancy, a mixed contourlet and wavelet transform was proposed in [28] where a contourlet transform was used at fine scales and the wavelet transform was employed at coarse scales. Later, bandlet transforms that aim at developing sparse geometric representations of the images have been introduced and studied in the context of image coding and image denoising [29]. Unlike contourlets and curvelets which are fixed transforms, bandelet transforms require an edge detection stage, followed by an adaptive decomposition. Furthermore, the directional selectivity of the 2D complex dualtree discrete wavelet transforms [30] has been exploited in the context of image [31] and video coding [32]. Since such a transform is redundant, Fowler et al. applied a noiseshaping process [33] to increase the sparsity of the wavelet coefficients.
With the ultimate goal of promoting sparsity in a transform domain, we investigate in this article techniques for optimizing sparsity criteria, which can be used for the design of all the filters defined in a non separable lifting structure. We should note that sparsest wavelet coefficients could be obtained by minimizing an ℓ_{0} criterion. However, such a problem is inherently nonconvex and NPhard [34]. Thus, unlike previous studies where prediction has been separately optimized by minimizing an ℓ_{2} criterion (i.e., the detail signal variance), we focus on the minimization of an ℓ_{1} criterion. Since the output of a prediction filter may be used as an input for other prediction filters, we then propose to optimize such a filter by minimizing a weighted ℓ_{1} criterion related to the global prediction error. We also propose to jointly optimize the prediction filters by using an algorithm that alternates between filter optimization and weight computation. While the minimization of an ℓ_{1} criterion is often considered in the signal processing literature such as in the compressed sensing field [35], it is worth pointing out that, to the best of our knowledge, the use of such a criterion for lifting operator design has not been previously investigated.
The rest of this article is organized as follows. In Section 2, we recall our recent study for the design of all the operators involved in a 2D non separable lifting structure [36, 37]. In Section 3, the motivation for using an ℓ_{1} criterion in the design of optimal lifting structures is firstly discussed. Then, the iterative algorithm for minimizing this criterion is described. In Section 4, we present a weighted ℓ_{1} criterion which aims at minimizing the global prediction error. In Section 5, we propose to jointly optimize the prediction filters by using an algorithm that alternates between optimizing all the filters and redefining the weights. Finally, in Section 6, experimental results are given and then some conclusions are drawn in Section 7.
2 2D lifting structure and optimization methods
2.1 Principle of the considered 2D NSLS structure
In this article, we consider a 2D NSLS composed of three prediction lifting steps followed by an update lifting step. The interest of this structure is twofold. First, it allows us to reduce the number of lifting steps and rounding operations. A theoretical analysis has been conducted in [13] showing that NSLS improves the coding performance due to the reduction of rounding effects. Furthermore, any separable predictionupdate LS structure has its equivalent in this form [13, 14]. The corresponding analysis structure is depicted in Figure 1.
Let x denote the digital image to be coded. At each resolution level j and each pixel location (m, n), its approximation coefficient is denoted by x_{ j }(m, n) and the associated four polyphase components by x_{0,j}(m, n) = x_{ j }(2m,2n), x_{1,j}(m,n) = x_{ j }(2m,2n+1), x_{2,j}(m,n) = x_{ j }(2m+1,2n), and x_{3,j}(m,n) = x_{ j }(2m + 1, 2n + 1). Furthermore, we denote by ${\mathbf{P}}_{j}^{\left(HH\right)}$, ${\mathbf{P}}_{j}^{\left(LH\right)}$, ${\mathbf{P}}_{j}^{\left(HL\right)}$, and U_{ j }the three prediction and update filters employed to generate the detail coefficients ${x}_{j+1}^{\left(HH\right)}$ oriented diagonally, ${x}_{j+1}^{\left(LH\right)}$ oriented vertically, ${x}_{j+1}^{\left(HL\right)}$ oriented horizontally, and the approximation coefficients x_{j+1}. In accordance with Figure 1, let us introduce the following notation:

For the first prediction step, the prediction multiple input, single output (MISO) filter ${\mathbf{P}}_{j}^{\left(HH\right)}$ can be seen as a sum of three single input, single output (SISO) filters ${\mathbf{P}}_{0,j}^{\left(HH\right)}$, ${\mathbf{P}}_{1,j}^{\left(HH\right)}$, and ${\mathbf{P}}_{2,j}^{\left(HH\right)}$ whose respective inputs are the components x_{0,j}, x_{1,j}and x_{2,j}.

For the second (resp. third) prediction step, the prediction MISO filter ${\mathbf{P}}_{j}^{\left(LH\right)}$ (resp. ${\mathbf{P}}_{j}^{\left(HL\right)}$) can be seen as a sum of two SISO filters ${\mathbf{P}}_{0,j}^{\left(LH\right)}$ and ${\mathbf{P}}_{1,j}^{\left(LH\right)}$ (resp. ${\mathbf{P}}_{0,j}^{\left(HL\right)}$ and ${\mathbf{P}}_{1,j}^{\left(HL\right)}$) whose respective inputs are the components x_{2,j}and ${x}_{j+1}^{\left(HH\right)}$ (resp. x_{1,j}and ${x}_{j+1}^{\left(HH\right)}$).

For the update step, the update MISO filter U_{ j }can be seen as a sum of three SISO filters ${\mathbf{U}}_{j}^{\left(HL\right)}$, ${\mathbf{U}}_{j}^{\left(LH\right)}$, and ${\mathbf{U}}_{j}^{\left(HH\right)}$ whose respective inputs are the detail coefficients ${x}_{j+1}^{\left(HL\right)}$, ${x}_{j+1}^{\left(LH\right)}$, and ${x}_{j+1}^{\left(HH\right)}$.
Now, it is easy to derive the expressions of the resulting coefficients in the 2D ztransform domain.^{a} Indeed, the ztransforms of the output coefficients can be expressed as follows:
where, for every polyphase index i ∈ {0,1, 2} and orientation o ∈ {HH, HL, LH},
The set ${\mathcal{P}}_{i,j}^{\left(o\right)}$ (resp. ${\mathcal{U}}_{j}^{\left(o\right)}$) and the coefficients ${p}_{i,j}^{\left(o\right)}\left(k,l\right)$ (resp. ${u}_{j}^{\left(o\right)}\left(k,l\right)$) denote the support and the weights of the three prediction filters (resp. of the update filter). Note that in Equations (1)(4), we have introduced the rounding operations ⌊.⌋ in order to allow lossytolossless encoding of the coefficients [7]. Once the considered NSLS structure has been defined, we will focus now on the optimization of its lifting operators.
2.2 Optimization methods
Since the detail coefficients are defined as prediction errors, the prediction operators are often optimized by minimizing the variance of the coefficients (i.e., their ℓ_{2}norm) at each resolution level. The rounding operators being omitted, it is readily shown that the minimum variance predictors must satisfy the wellknown YuleWalker equations. For example, for the prediction vector ${\mathbf{p}}_{j}^{\left(HH\right)}$, the normal equations read
where

${\mathbf{p}}_{j}^{\left(HH\right)}={\left({\mathbf{p}}_{0,j}^{\left(HH\right)},{\mathbf{p}}_{1,j}^{\left(HH\right)},{\mathbf{p}}_{2,j}^{\left(HH\right)}\right)}^{\mathsf{\text{T}}}$ is the prediction vector, and, for every i ∈ {0, 1, 2},
$${p}_{i,j}^{\left(HH\right)}={\left({p}_{i,j}^{\left(HH\right)}\left(k,l\right)\right)}_{\left(k,l\right)\in {\mathcal{P}}_{i,j}^{\left(HH\right)}},$$ 
${\stackrel{\u0303}{\mathbf{x}}}_{j}^{\left(HH\right)}\left(m,n\right)={\left({\mathbf{x}}_{0,j}^{\left(HH\right)}\left(m,n\right),{\mathbf{x}}_{1,j}^{\left(HH\right)}\left(m,n\right),{\mathbf{x}}_{2,j}^{\left(HH\right)}\left(m,n\right)\right)}^{\mathsf{\text{T}}}$ is the reference vector with
$${\mathbf{x}}_{i,j}^{\left(HH\right)}\left(m,n\right)={\left({x}_{i,j}\left(mk,nl\right)\right)}_{\left(k,l\right)\in {\mathcal{P}}_{i,j}^{\left(HH\right)}}.$$
The other optimal prediction filters ${\mathbf{p}}_{j}^{\left(HL\right)}$ and ${\mathbf{p}}_{j}^{\left(LH\right)}$ are obtained in a similar way.
Concerning the update filter, the conventional approach consists of optimizing its coefficients by minimizing the reconstruction error when the detail signal is canceled [20, 38]. Recently, we have proposed a new optimization technique which aims at reducing the aliasing effects [36, 37]. To this end, the update operator is optimized by minimizing the quadratic error between the approximation signal and the decimated version of the output of an ideal lowpass filter:
where ${y}_{j+1}\left(m,n\right)={\u1ef9}_{j}\left(2m,2n\right)=\left(h*{x}_{j}\right)\left(2m,2n\right)$. Recall that the impulse response of the 2D ideal lowpass filter is defined in the spatial domain by:
Thus, the optimal update coefficients u_{ j }minimizing the criterion $\stackrel{\u0303}{\mathcal{J}}$ are solutions of the following linear system of equations:
Where

${\mathbf{u}}_{j}={\left({u}_{j}^{\left(o\right)}\left(k,l\right)\right)}_{\left(k,l\right)\in {\mathcal{U}}_{j}^{\left(o\right)},o\in \left\{HL,LH,HH\right\}}^{\mathsf{\text{T}}}$is the update weight vector,

${\mathbf{x}}_{j+1}\left(m,n\right)={\left({x}_{j+1}^{\left(o\right)}\left(mk,nl\right)\right)}_{\left(k,l\right)\in {\mathcal{P}}_{i,j}^{\left(o\right)},o\in \left\{HL,LH,HH\right\}}^{\mathsf{\text{T}}}$ is the reference vector containing the detail signals previously computed at the j th resolution level.
Now, we will introduce a novel twist in the optimization of the different filters: the use of an ℓ_{1}based criterion in place of the usual ℓ_{2}based measure.
3 From ℓ_{2}to ℓ_{1}minimization
3.1 Motivation
Wavelet coefficient statistics are often exploited in order to increase image compression efficiency [39]. More precisely, detail wavelet coefficients are often viewed as realizations of a zeromean continuous random variable whose probability density function f is given by a generalized Gaussian distribution (GGD) [40, 41]:
where $\Gamma \left(z\right)={\int}_{0}^{+\infty}{t}^{z1}{e}^{t}dt$ is the Gamma function, α > 0 is the scale parameter, and β > 0 is the shape parameter. We should note that in the particular case when β = 2 (resp. β = 1), the GGD corresponds to the Gaussian distribution (resp. the Laplace one). The parameters α and β can be easily estimated by using the maximum likelihood technique [42].
Let us now adopt this probabilistic GGD model for the detail coefficients generated by a lifting structure. More precisely, at each resolution level j and orientation o (o ∈ {HL,LH,HH}), the wavelet coefficients ${x}_{j+1}^{\left(o\right)}\left(m,n\right)$ are viewed as realizations of random variable ${X}_{j+1}^{\left(o\right)}$ with probability distribution given by a GGD with parameters ${\alpha}_{j+1}^{\left(o\right)}$ and ${\beta}_{j+1}^{\left(o\right)}$. Thus, this class of distributions leads us to the following sample estimate of the differential entropy h of the variable ${X}_{j+1}^{\left(o\right)}$[11, 43]:
where (M_{ j },N_{ j }) corresponds to the dimensions of the subband ${x}_{j+1}^{\left(o\right)}$.
Let ${\left({\overline{x}}_{j+1}^{(o)}(m,n)\right)}_{\underset{1\le n\le {N}_{j}}{1\le m\le {M}_{j}}}$ be the outputs of a uniform quantizer with quantization step q driven with the realvalued coefficients ${\left({x}_{j+1}^{(o)}(m,n)\right)}_{\underset{1\le n\le {N}_{j}}{1\le m\le {M}_{j}}}$. The coefficients ${\stackrel{\u0304}{x}}_{j+1}^{\left(o\right)}\left(m,n\right)$ can be viewed as realizations of a random variable ${\overline{X}}_{j+1}^{\left(o\right)}$ taking its values in {..., 2q, q, 0, q, 2q, ...}. At high resolution, it was proved in [43] that the following relation holds between the discrete entropy ${\overline{X}}_{j+1}^{\left(o\right)}$ and the differential entropy h of ${X}_{j+1}^{\left(o\right)}$:
Thus, from Equation (9), we see [44] that the entropy H(${\overline{X}}_{j+1}^{\left(o\right)}$) of ${\overline{X}}_{j+1}^{\left(o\right)}$ is (up to a dividing factor and an additive constant) approximatively equal to:
This shows that there exists a close link between the minimization of the entropy of the detail wavelet coefficients and the minimization of their ${\ell}_{{{\beta}_{j+1}^{\left(o\right)}}_{}}$norm. This suggests in particular that most of the existing studies minimizing the ℓ_{2}norm of the detail signals aim at minimizing their entropy by assuming a Gaussian model.
Based on these results, we have analyzed the detail wavelet coefficients generated by the decomposition based on the lifting structure NSLS(2,2)OPTL2 described in Section 6. Figure 2 shows the distribution of each detail subband for the "einst" image when the prediction filters are optimized by minimizing the ℓ_{2}norm of the detail coefficients. The maximum likelihood technique is used to estimate the β parameter.
It is important to note that the shape parameters of the resulting detail subbands are closer to β = 1 than to β = 2. Further experiments performed on a large dataset of images^{b} have shown that the average of β values are closer to 1 (typical values range from 0.5 to 1.5). These observations suggest that minimizing the ℓ_{1}norm may be more appropriate than ℓ_{2} minimization. In addition, the former approach has the advantage of producing sparse representations.
3.2 ℓ_{1}minimization technique
Instead of minimizing the ℓ_{2}norm of the detail coefficients ${x}_{j+1}^{\left(o\right)}$ as done in [37], we propose in this section to optimize each of the prediction filters by minimizing the following ℓ_{1} criterion:
where x_{ i,j }(m,n) is the (i + 1)^{th}polyphase component to be predicted, ${\stackrel{\u0303}{\mathbf{x}}}_{j}^{\left(o\right)}\left(m,n\right)$ is the reference vector containing the samples used in the prediction step, ${\mathbf{p}}_{j}^{\left(o\right)}$ is the prediction operator vector to be optimized (L will subsequently designate its length). Although the criterion in (11) is convex, a major difficulty that arises in solving this problem stems from the fact that the function to be minimized is not differentiable. Recently, several optimization algorithms have been proposed to solve nonsmooth minimization problems like (11). These problems have been traditionally addressed with linear programming [45]. Alternatively, a flexible class of proximal optimization algorithms has been developed and successfully employed in a number of applications. A survey on these proximal methods can be found in [46]. These methods are also closely related to augmented Lagrangian methods [47]. In our context, we have employed the DouglasRachford algorithm which is an efficient optimization tool for this problem [48].
3.2.1 The DouglasRachford algorithm
For minimizing the ℓ_{1} criterion, we will resort to the concept of proximity operators [49], which has been recognized as a fundamental tool in the recent convex optimization literature [50, 51]. The necessary background on convex analysis and proximity operators [52, 53] is given in Appendix A.
Now, we recall that our minimization problem (11) aims at optimizing the prediction filters by minimizing the ℓ_{1}norm of the difference between the current pixel x_{ i,j }and its predicted value. We note here that ${x}_{i,j}={\left({x}_{i,j}(m,n)\right)}_{\underset{1\le n\le {N}_{j}}{1\le m\le {M}_{j}}}$ can be viewed as an element of the Euclidean space ${\mathbb{R}}^{{K}_{j}}$, where K_{ j } = M_{ j } × N_{ j } . Thus, the minimization problem (11) can be rewritten as:
where V is the vector space defined as
Based on the definition of the indicator function ı_{ V }(see Appendix A), Problem (12) is equivalent to the following minimization problem:
Therefore, Problem (13) can be viewed as a minimization of a sum of two functions f_{1} and f_{2} defined by:
In this case, the DouglasRachford algorithm can be applied to provide an appealing numerical solution to Problem (13) (see Appendix B).
Although it is an iterative algorithm, we have observed experimentally that the convergence of the DouglasRachford algorithm is generally ensured after a small number of iterations (often between 30 et 60 iterations). As an example, we plot in Figure 3a (resp. 3b) the evolution of the criterion ${\mathcal{J}}_{{\ell}_{1}}\left({\mathbf{p}}_{0}^{\left(HH\right)}\right)$ (resp. ${\mathcal{J}}_{{\ell}_{1}}\left({\mathbf{p}}_{0}^{\left(LH\right)}\right)$) w.r.t the iteration number for this algorithm.
Once the different terms involved in the iterative algorithm (33) are defined, this one can be applied and further extended to optimize all the prediction filters.
4 Global prediction error minimization technique
4.1 Motivation
Up to now, each prediction filter ${\mathbf{p}}_{j}^{\left(o\right)}\left(o\in \left\{HL,LH,HH\right\}\right)$ has been separately optimized by minimizing the ℓ_{1}norm of the corresponding detail signal ${x}_{j+1}^{\left(o\right)}$ which seems appropriate to determine ${\mathbf{p}}_{j}^{\left(LH\right)}$ and ${\mathbf{p}}_{j}^{\left(HL\right)}$. However, it can be noticed from Figure 1 that the diagonal detail signal ${x}_{j+1}^{\left(HH\right)}$ is also used through the second and the third prediction steps to compute the vertical and the horizontal detail signals respectively. Therefore, the solution ${\mathbf{p}}_{j}^{\left(HH\right)}$ resulting from the previous optimization method may be suboptimal. As a result, we propose to optimize the prediction filter ${\mathbf{p}}_{j}^{\left(HH\right)}$ by minimizing the global prediction error, as described in detail in the next section.
4.2 Optimization of the prediction filter ${\mathbf{p}}_{j}^{\left(HH\right)}$
More precisely, instead of minimizing the ℓ_{1}norm of ${x}_{j+1}^{\left(HH\right)}$, the filter ${\mathbf{p}}_{j}^{\left(HH\right)}$ will be optimized by minimizing the sum of the ℓ_{1}norm of the three detail subbands ${x}_{j+1}^{\left(o\right)}$. To this respect, we will consider the minimization of the following weighted ℓ_{1} criterion:
where ${\kappa}_{j+1}^{\left(o\right)}$, o ∈ {HL, LH, HH}, are strictly positive weighting terms.
Before focusing on the method employed to minimize the proposed criterion, we should first express ${\mathcal{J}}_{w{\ell}_{1}}$ as a function of the filter ${\mathbf{p}}_{j}^{\left(HH\right)}$ to be optimized.
Let ${\left({x}_{i,j}^{\left(1\right)}\left(m,n\right)\right)}_{i\in \left\{0,1,2,3\right\}}$ be the four outputs obtained from ${\left({x}_{i,j}\left(m,n\right)\right)}_{i\in \left\{0,1,2,3\right\}}$ following the first prediction step (see Figure 1). Although ${x}_{i,j}^{\left(1\right)}\left(m,n\right)={x}_{i,j}\left(m,n\right)$ for all i ∈ {0, 1, 2}, the use of the superscript will make the presentation below easier. Thus ${x}_{j+1}^{\left(o\right)}$ can be expressed as:
where ${h}_{i,j}^{\left(o,1\right)}$ is a filter which depends on the prediction coefficients of ${\mathbf{p}}_{j}^{\left(LH\right)}$ and ${\mathbf{p}}_{j}^{\left(HL\right)}$.
Knowing that
where ${\tilde{x}}_{j}^{(HH)}(m,n)={\left({x}_{i,j}(mr,ns)\right)}_{\underset{i\in \{0,1,2\}}{(r,s)\in {\mathcal{P}}_{j}^{(HH)}}}({\mathcal{P}}_{j}^{(HH)}$ is the support of the predictor ${\mathbf{p}}_{j}^{\left(HH\right)}$), we thus obtain, after some simple calculations,
Where
Consequently, the proposed weighted ℓ_{1} criterion (Equation (16)) can be expressed as:
It is worth noting that in practice, the determination of ${y}_{j}^{\left(o,1\right)}\left(m,n\right)$ and ${\mathbf{x}}_{j}^{\left(o,1\right)}\left(m,n\right)$ does not require to find the explicit expressions of ${h}_{i,j}^{\left(o,1\right)}$ and these signals can be determined numerically as follows:

The first term (resp. the second one) in the expression of ${y}_{j}^{\left(o,1\right)}\left(m,n\right)$ in Equation (20) can be found by computing ${x}_{j+1}^{\left(o\right)}\left(m,n\right)$ from the components ${\left({x}_{i,j}^{\left(1\right)}\left(m,n\right)\right)}_{i\in \left\{0,1,2,3\right\}}$ while setting ${x}_{3,j}^{\left(1\right)}\left(m,n\right)=0$ (resp. while setting ${x}_{i,j}^{\left(1\right)}\left(m,n\right)=0$ for i ∈ {0,1,2} and ${x}_{3,j}^{\left(1\right)}\left(m,n\right)={x}_{3,j}\left(m,n\right)$).

The vector ${\mathbf{x}}_{j}^{\left(o,1\right)}\left(m,n\right)$ in Equation (21) can be found as follows. For each i ∈ {0,1,2}, the computation of its component ${\sum}_{k,l}{h}_{3,j}^{\left(o,1\right)}\left(k,l\right){x}_{i,j}\left(mk,nl\right)$ requires to compute ${x}_{j+1}^{\left(o\right)}\left(m,n\right)$ by setting ${x}_{3,j}^{\left(1\right)}\left(m,n\right)={x}_{i,j}\left(m,n\right)$ and ${x}_{{i}^{\prime},j}^{\left(1\right)}\left(m,n\right)=0$ for i' ∈ {0,1,2}. The result of this operation has to be considered for different shift values (r, s) (as can be seen in Equation (21)).
Once the different terms involved in the proposed weighted criterion in Equation (22) are defined (the constant values ${\kappa}_{j+1}^{\left(o\right)}$ are supposed to be known), we will focus now on its minimization. Indeed, unlike the previous criterion (Equation 11), which consists only of an ℓ_{1} term, the proposed criterion is a sum of three ℓ_{1} terms. To minimize such a criterion (22), one can still use the DouglasRachford algorithm through a formulation in a product space [46, 54].
4.2.1 DouglasRachford algorithm in a product space
Consider the ℓ_{1} minimization problem:
where ${\kappa}_{j+1}^{\left(o\right)}$, o ∈ {HL,LH,HH}, are positive weights.
Since the DouglasRachford algorithm described hereabove is designed for the sum of two functions, we can reformulate (23) under this form in the 3fold product space ${\mathbb{H}}_{j}$
If we define the vector subspace U as
the minimization problem (Equation 23) is equivalent to
where
We are thus back to a problem involving two functions in a larger space, which is the product space ${\mathbb{H}}_{j}$. So, the DouglasRachford algorithm can be applied to solve our minimization problem (see Appendix C). Finally, once the prediction filter ${\mathbf{p}}_{j}^{\left(HH\right)}$ is optimized and fixed, it can be noticed that the other prediction filters ${\mathbf{p}}_{j}^{\left(HL\right)}$ and ${\mathbf{p}}_{j}^{\left(LH\right)}$ can be separately optimized by minimizing ${\mathcal{J}}_{{\ell}_{1}}\left({\mathbf{p}}_{j}^{\left(HL\right)}\right)$ and ${\mathcal{J}}_{{\ell}_{1}}\left({\mathbf{p}}_{j}^{\left(LH\right)}\right)$ as explained in Section 3. This is justified by the fact that the inputs of the filter ${\mathbf{p}}_{j}^{\left(HL\right)}$ (resp. ${\mathbf{p}}_{j}^{\left(LH\right)}$) are independent of the output of the filter ${\mathbf{p}}_{j}^{\left(LH\right)}$ (resp. ${\mathbf{p}}_{j}^{\left(HL\right)}$).
5 Joint optimization method
5.1 Motivation
From Equations (20) and (21), it can be observed that ${y}_{j}^{\left(o,1\right)}$ and ${\mathbf{x}}_{j}^{\left(o,1\right)}$, which are used to optimize ${\mathbf{p}}_{j}^{\left(HH\right)}$, depend on the coefficients of the prediction filters ${\mathbf{p}}_{j}^{\left(HL\right)}$ and ${\mathbf{p}}_{j}^{\left(LH\right)}$. On the other hand, since ${\mathbf{p}}_{j}^{\left(HL\right)}$ and ${\mathbf{p}}_{j}^{\left(LH\right)}$ use ${x}_{j+1}^{\left(HH\right)}$ as reference signal in the second and the third prediction steps, their optimal values will depend on the optimal prediction filter ${\mathbf{p}}_{j}^{\left(HH\right)}$. Thus, we conclude that the optimization of the filters (${\mathbf{p}}_{j}^{\left(HL\right)}$, ${\mathbf{p}}_{j}^{\left(LH\right)}$) depends on the optimization of the filter ${\mathbf{p}}_{j}^{\left(HH\right)}$ and viceversa.
A joint optimization method can therefore be proposed which iteratively optimizes the prediction filters ${\mathbf{p}}_{j}^{\left(HH\right)}$, ${\mathbf{p}}_{j}^{\left(HL\right)}$, and ${\mathbf{p}}_{j}^{\left(LH\right)}$.
5.2 Proposed algorithms
While the optimization of the prediction filters ${\mathbf{p}}_{j}^{\left(HL\right)}$ and ${\mathbf{p}}_{j}^{\left(LH\right)}$ is simple, the optimization of the prediction filter ${\mathbf{p}}_{j}^{\left(HH\right)}$ is less obvious. Indeed, if we examine the criterion ${\mathcal{J}}_{w{\ell}_{1}}$, the immediate question that arises is: which values of the weighting parameters will produce the sparsest decomposition?
A simple solution consists of setting all the weights ${\kappa}_{j+1}^{\left(o\right)}$ to one. Then, we are considering the particular case of the unweighted ℓ_{1} criterion, which simply represents the sum of the ℓ_{1}norm of the three details subbands ${x}_{j+1}^{\left(o\right)}$. In this case, the joint optimization problem is solved by applying the following simple iterative algorithm at each resolution level j.
5.2.1 First proposed algorithm
➀ Initialize the iteration number it to 0.

Optimize separately the three prediction filters as explained in Section 3. The resulting filters will be denoted respectively by ${\mathbf{p}}_{j}^{\left(HH,0\right)}$, ${\mathbf{p}}_{j}^{\left(LH,0\right)}$, and ${\mathbf{p}}_{j}^{\left(HL,0\right)}$.

Compute the resulting global unweighted prediction error (i.e., the sum of the ℓ_{1}norm of the three resulting details subbands).
➁ for it = 1,2,3,

Set ${\mathbf{p}}_{j}^{\left(LH\right)}={\mathbf{p}}_{j}^{\left(LH,it1\right)},{\mathbf{p}}_{j}^{\left(HL\right)}={\mathbf{p}}_{j}^{\left(HL,it1\right)}$, and optimize ${\mathbf{P}}_{j}^{\left(HH\right)}$ by minimizing ${\mathcal{J}}_{w{\ell}_{1}}\left({\mathbf{p}}_{j}^{\left(HH\right)}\right)$ (while setting ${\kappa}_{j+1}^{\left(o\right)}=1$). Let ${\mathbf{p}}_{j}^{\left(HH,it\right)}$ be the new optimal filter at iteration it.

Set ${\mathbf{p}}_{j}^{\left(HH\right)}={\mathbf{p}}_{j}^{\left(HH,it\right)}$, and optimize ${\mathbf{P}}_{j}^{\left(LH\right)}$ by minimizing ${\mathcal{J}}_{{\ell}_{1}}\left({\mathbf{p}}_{0}^{\left(LH\right)}\right)$. Let ${\mathbf{p}}_{j}^{\left(LH,it\right)}$ be the new optimal filter.

Set ${\mathbf{p}}_{j}^{\left(HH\right)}={\mathbf{p}}_{j}^{\left(HH,it\right)}$, and optimize ${\mathbf{P}}_{j}^{\left(HL\right)}$ by minimizing ${\mathcal{J}}_{{\ell}_{1}}\left({\mathbf{p}}_{j}^{\left(HL\right)}\right)$. Let ${\mathbf{p}}_{j}^{\left(HL,it\right)}$ be the new optimal filter.
Once the prediction filters are optimized, the update filter is finally optimized as explained in Section 2. However, in practice, once all the filters are optimized and the decomposition is performed, the different generated wavelet subbands ${x}_{j+1}^{\left(o\right)}$ are weighted before the entropy encoding (using JPEG2000 encoder) in order to obtain a distortion in the spatial domain which is very close to the distortion in the wavelet domain.
More precisely, as we can see in Figure 4, each wavelet subband is multiplied by $\sqrt{{w}_{j+1}^{\left(o\right)}}$, where ${w}_{j+1}^{\left(o\right)}$ represents the weight corresponding to ${x}_{j+1}^{\left(o\right)}$. Generally, these weights are computed based on the wavelet filters used for the reconstruction process as indicated in [55, 56]. A simple weight computation procedure based on the following assumption can be used. As shown in [55], if the error signal in a subband (i.e., the quantization noise) is white and uncorrelated to the other subband errors, the reconstruction distortion in the spatial domain is a weighted sum of the distortion in each wavelet subband. Therefore, for each subband ${x}_{j+1}^{\left(o\right)}$, a white Gaussian noise of variance ${\left({\sigma}_{j+1}^{\left(o\right)}\right)}^{2}$ is firstly added while keeping the remaining subbands noiseless. Then, the resulting distortion in the spatial domain ${\widehat{D}}_{s}$ is evaluated by taking the inverse transform. Finally, the corresponding subband weight can be estimated as follows:
This weighting step is very important since standard bit allocation algorithms assume that the quadratic distortion in the wavelet domain is equal to that in the spatial domain, which is not true in the case of biorthogonal wavelets [55]. Therefore, the filters resulting from the first choice of ${\kappa}_{j+1}^{\left(o\right)}$ are suboptimal in the sense that they do not take into account the weighting procedure. For this reason, it has been noticed on some experiments (as it can be seen in Section 6) that the basic optimization technique does not achieve the best coding performances.
Thus, a more judicious choice of ${\kappa}_{j+1}^{\left(o\right)}$ should take into account the weighting procedure applied to the wavelet coefficients before the entropy encoding process. Furthermore, if in the general formula in Equation (9), we consider the case of ${\beta}_{j+1}^{\left(o\right)}=1$, the differential entropy of ${X}_{j+1}^{\left(o\right)}$ multiplied by $\sqrt{{w}_{j+1}^{\left(o\right)}}$ becomes:
where ${\alpha}_{j+1}^{\left(o\right)}$ can be estimated by using a classical maximum likelihood estimate. Thus, it can be observed from Equation (29) that the first term of the resulting entropy, which corresponds to a weighted ℓ_{1}norm of ${x}_{j+1}^{\left(o\right)}$, is inversely proportional to ${\alpha}_{j+1}^{\left(o\right)}$. Consequently, in order to obtain a criterion (Equation 16) that results in a good approximation of the entropy (29), a more reasonable choice of ${\kappa}_{j+1}^{\left(o\right)}$ will be as follows:
Since the resulting entropy of each subband uses weights which also depend on the prediction filters (as mentioned above), we propose an iterative algorithm that alternates between optimizing all the filters and redefining the weights. This algorithm, which is performed for each resolution level j, is as follows.
5.2.2 Second proposed algorithm
➀ Initialize the iteration number it to 0.

Optimize separately the three prediction filters as explained in Section 3. The resulting filters will be denoted respectively by ${\mathbf{p}}_{j}^{\left(HH,0\right)}$, ${\mathbf{p}}_{j}^{\left(LH,0\right)}$, and ${\mathbf{p}}_{j}^{\left(HL,0\right)}$.

Optimize the update filter (as explained in Section 2).

Compute the weights ${w}_{j+1}^{\left(o,0\right)}$ of each detail subband as well as the constant values ${\kappa}_{j+1}^{\left(o,0\right)}$.
➁ for it = 1,2,3,...

Set ${\mathbf{p}}_{j}^{\left(LH\right)}={\mathbf{p}}_{j}^{\left(LH,it1\right)},{\mathbf{p}}_{j}^{\left(HL\right)}={\mathbf{p}}_{j}^{\left(HL,it1\right)}$, and optimize ${\mathbf{P}}_{j}^{\left(HH\right)}$ by minimizing ${\mathcal{J}}_{w{\ell}_{1}}\left({\mathbf{p}}_{j}^{\left(HH\right)}\right)$. Let ${\mathbf{p}}_{j}^{\left(HH,it\right)}$ be the new optimal filter.

Set ${\mathbf{p}}_{j}^{\left(HH\right)}={\mathbf{p}}_{j}^{\left(HH,it\right)}$, and optimize ${\mathbf{P}}_{j}^{\left(LH\right)}$ by minimizing ${\mathcal{J}}_{{\ell}_{1}}\left({\mathbf{p}}_{j}^{\left(LH\right)}\right)$. Let ${\mathbf{p}}_{j}^{\left(LH,it\right)}$ be the new optimal filter.

Set ${\mathbf{p}}_{j}^{\left(HH\right)}={\mathbf{p}}_{j}^{\left(HH,it\right)}$, and optimize ${\mathbf{P}}_{j}^{\left(HL\right)}$ by minimizing ${\mathcal{J}}_{{\ell}_{1}}\left({\mathbf{p}}_{j}^{\left(HL\right)}\right)$. Let ${\mathbf{p}}_{j}^{\left(HL,it\right)}$ be the new optimal filter.

Optimize the update filter (as explained in Section 2).

Compute the new weights ${w}_{j+1}^{\left(o,it\right)}$ as well as ${\kappa}_{j+1}^{\left(o,it\right)}$.
Let us now make some observations concerning the convergence of the proposed algorithm. Since the goal of the second weighting procedure is to better approximate the entropy, we have computed at the end of each iteration number it the differential entropy of the three resulting details subbands. More precisely, the evaluated criterion, obtained from Equation (29) by setting ${\alpha}_{j+1}^{\left(o\right)}=\frac{1}{{\kappa}_{j+1}^{\left(o\right)}}$ and performing the sum over the three details subbands, is given by:
Figure 5 illustrates the evolution of this criterion w.r.t the iteration number of the algorithm. It can be noticed that the decrease of the criterion is mainly achieved during the early iterations (about after 7 iterations).
6 Experimental results
Simulations were carried out on two kinds of still images originally quantized over 8 bpp which are either single views or stereoscopic ones. A large dataset composed of 50 still images^{b} and 50 stereo images^{c} has been considered. The gain related to the optimization of the NSLS operators, using different minimization criteria, was evaluated in these contexts. In order to show the benefits of the proposed ℓ_{1} optimization criterion, we provide the results for the following decompositions carried out over three resolution levels:

The first one is the LS corresponding to the 5/3 transform, also known as the (2,2) wavelet transform [7]. In the following, this method will be designated by NSLS(2,2).

The second method consists of optimizing the prediction and update filters as proposed in [20, 38]. More precisely, the prediction filters are optimized by minimizing the ℓ_{2}norm of the detail coefficients whereas the update filter is optimized by minimizing the reconstruction error. This optimization method will be designated by NSLS(2,2)OPTGM.

The third approach corresponds to our previous method presented recently in [37]. While the prediction filters are optimized in the same way as the second method, the update filter is optimized by minimizing the difference between the approximation signal and the decimated version of the output of an ideal lowpass filter. We emphasize here that the prediction filters are optimized separately. This method will be denoted by NSLS(2,2)OPTL2.

The fourth method modifies the optimization stage of the prediction filters by using the ℓ_{1}norm instead of the ℓ_{2}norm. The optimization of the update filter is similar to the technique used in the third method. In what follows, this method will be designated by NSLS(2,2)OPTL1.

The fifth method consists of jointly optimizing the prediction filters by using the proposed weighted ℓ_{2} minimization technique where the weights ${\kappa}_{j+1}^{\left(o\right)}$ are set to $\frac{1}{{\alpha}_{j+1}^{\left(o\right)}}$. The optimization of the update filter is similar to the technique used in the third and fourth methods. This optimization method will be designated by NSLS(2,2)OPTWL1. We have also tested this optimization method when the weights ${\kappa}_{j+1}^{\left(o\right)}$ are set to 1. In this case, the method will be denoted by NSLS(2,2)OPTWL1 (${\kappa}_{j+1}^{\left(o\right)}=1$).
Figures 6 and 7 show the scalability in quality of the reconstruction procedure by providing the variations of the PSNR versus the bitrate for the images "castle" and "straw" using JPEG2000 as entropy codec. A more exhaustive evaluation was also performed by applying the different methods to 50 still images^{b}. The average PSNR perimage is illustrated in Figure 8.
These plots show that NSLS(2,2)OPTL2 outperforms NSLS(2,2) by 0.10.5 dB. It can also be noticed that NSLS(2,2)OPTL2 and NSLS(2,2)OPTGM perform similarly in terms of quality of reconstruction. An improvement of 0.10.3 dB is obtained by using the ℓ_{1} minimization technique instead of the ℓ_{2} one. Finally, the joint optimization technique (NSLS(2,2)OPTWL1) outperforms the separate optimization technique (NSLS(2,2)OPTL1) and improves the PSNR by 0.10.2 dB. The gain becomes more important (up to 0.55 dB) when compared with NSLS(2,2)OPTL2. It is important to note here that setting the weights ${\kappa}_{j+1}^{\left(o\right)}$ to 1 (NSLS(2,2)OPTWL1 (${\kappa}_{j+1}^{\left(o\right)}=1$)) can yield to a degradation of about 0.10.25 dB compared with NSLS(2,2)OPTWL1 on some images.
Figures 9 and 10 display the reconstructed images of "lena" and "einst". In addition to PSNR and SSIM metrics, the quality ofthe reconstructed images are also compared in terms of VSNR (Visual SignaltoNoise ratio) which was found to be an efficient metric for quantifying the visual fidelity of natural images [57]: it is based on physical luminances and visual angle (rather than on digital pixel values and pixelbased dimensions) to accommodate different viewing conditions. It can be observed that the weighted ℓ_{1} minimization technique significantly improves the visual quality of reconstruction. The difference in VSNR (resp. PSNR) between NSLS(2,2)OPTL2 and NSLS(2,2)OPTWL1 ranges from 0.35 dB to 0.6 dB (resp. 0.25 dB to 0.3 dB). Comparing Figure 9c (resp. Figure 10c) with Figure 9d (resp. Figure 10d), the visual improvement achieved by our method can be mainly seen in the hat and face of Lena (resp. in Einstein's face).
The second part of the experiments is concerned with stereo images. Most of the existing studies in this field rely on disparity compensation techniques [58, 59]. The basic principles involved in this technique first consists of estimating the disparity map. Then, one image is considered as a reference image and the other is predicted in order to generate a prediction error referred to as a residual image. Finally, the disparity field, the reference image and the residual one are encoded [58, 60]. In this context, Moellenhoff and Maier [61] analyzed the characteristics of the residual image and proved that such images have properties different from natural images. This suggests that transforms that work well for natural images may not be as wellsuited for residual images. For this reason, we also proposed to apply these optimization methods for encoding the reference image and the residual one. The resulting ratedistortion curves for the "white house" and "pentagon" stereo images are illustrated in Figures 11 and 12. A more exhaustive evaluation was also performed by applying the different methods to 50 stereo images^{c}. The average PSNR perimage is illustrated in Figure 13. Figure 14 displays the reconstructed target image of the "pentagon" stereo pair. It can be observed that the proposed joint optimization method leads to an improvement of 0.35 dB (resp. 0.016) in VSNR (resp. SSIM) compared with the decomposition in which the prediction filters are optimized separately. For instance, it can be noticed that the edges of the pentagon's building as well as the roads are better reconstructed in Figure 14d.
For completeness, the performance of the proposed method (NSLS(2,2)OPTWL1) has also been compared with the 9/7 transform retained for the lossy mode of JPEG2000 standard. Table 1 shows the performance of the latter methods in terms of PSNR, SSIM, and VSNR. Since the human eye cannot always distinguish the subjective image quality at middle and high bitrate, the results were restricted to the lower bitrate values.
While the proposed method is less performant in terms of PSNR than the 9/7 transform for some images, it can be noticed from Table 1 that better results are obtained in terms of perceptual quality. For instance, Figures 15 and 16 illustrate some reconstructed images. It can be observed that the proposed method (NSLS(2,2)OPTWL1) achieves a gain of about 0.20.4 dB (resp. 0.010.013) in terms of VSNR (resp. SSIM). Furthermore, Figures 17 and 18 display the reconstructed target image for the stereo image pairs "shrub" and "spot5". While NSLS(2,2)OPTWL1 and 9/7 transform show similar visual quality for the "spot5" pair, the proposed method leads to better quality of reconstruction than the 9/7 transform for the "shrub" stereo images.
Before concluding the article, let us now study the complexity of the proposed sparsity criteria for the optimization of the prediction filters. Table 2 gives the iteration number and the execution time for the ℓ_{1} and weighted ℓ_{1} minimization techniques when considering different image sizes. These results have been obtained with a Matlab implementation on an Intel Core 2 (2.93 GHz) architecture. It is clear that the execution time increases with the image size. Furthermore, we note that the ℓ_{1} minimization technique is very fast whereas the weighted ℓ_{1} technique needs an additional time of about 0.32.6 seconds. This increase is due to the fact that the algorithm is reformulated in a threefold product space as explained in Section 4.2. However, since the DouglasRachford algorithm in a product space has some blocks which can be implemented in a parallel way, the complexity can be reduced significantly (up to three times) when performing an appropriate implementation on a multicore architecture. These results as well as the good compression performance in terms of reconstruction quality confirm the effectiveness of the proposed sparsity criteria.
7 Conclusion
In this article, we have studied different optimization techniques for the design of filters in a NSLS structure. A new criterion has been presented for the optimization of the prediction filters in this context. The idea consists of jointly optimizing these filters by minimizing iteratively a weighted ℓ_{1} criterion. Experimental results carried out on still images and stereo images pair have illustrated the benefits which can be drawn from the proposed optimization technique. In future study, we plan to extend this optimization method to LS with more than two stages like the PUP and PUPU structures.
Appendix
A Some background on convex optimization
The main definitions which will be useful to understand our optimization algorithms are briefly summarized below:

ℝ^{K}is the usual Kdimensional Euclidean space with norm ..

The distance function to a nonempty set C ⊂ ℝ^{K}is defined by
$$\forall \mathbf{x}\in {\mathbb{R}}^{K},\phantom{\rule{1em}{0ex}}{d}_{C}\left(\mathbf{x}\right)=\underset{\mathbf{y}\in C}{\text{inf}}\left\right\mathbf{x}\mathbf{y}\left\right.$$ 
The projection of x ∈ ℝ^{K}onto a nonempty closed convex set C ⊂ ℝ^{K}is the unique point P_{ C }(x) ∈ C such that d_{ C }(x) = x  P_{ C }(x).

The indicator function of C is given by
$$\forall \mathbf{x}\in {\mathbb{R}}^{K,}\phantom{\rule{1em}{0ex}}{\iota}_{C}\left(\mathbf{x}\right)=\left\{\begin{array}{cc}\hfill 0\hfill & \hfill \mathsf{\text{if}}\mathbf{x}\phantom{\rule{2.77695pt}{0ex}}\in C,\hfill \\ \hfill +\infty \hfill & \hfill \mathsf{\text{otherwise}}.\hfill \end{array}\right.$$(32) 
Γ_{0}(ℝ^{K}) is the class of functions from ℝ^{K}to ]  ∞, + ∞] which are lower semicontinuous, convex, and not identically equal to + ∞.

The proximity operator of f ∈ Γ_{0}(ℝ^{K}) is $\mathsf{\text{pro}}{\mathsf{\text{x}}}_{f}:{\mathbb{R}}^{K}\to {\mathbb{R}}^{K}:\mathbf{x}\mapsto \text{arg}\underset{\mathbf{y}\in {\mathbb{R}}^{K}}{\text{min}}f\left(\mathbf{y}\right)+\frac{1}{2}\left\right\mathbf{x}\mathbf{y}{}^{2}$. It is important to note that the proximity operator generalizes the notion of a projection operator onto a closed convex set C in the sense that $\mathsf{\text{pro}}{\mathsf{\text{x}}}_{{\iota}_{C}}={P}_{C}$, and it moreover possesses most of its attractive properties [49] that make it particularly wellsuited for designing iterative minimization algorithms.
B The Douglas Rachford algorithm
The solution of the Problem (13) (which is the sum of the two functions f_{1} and f_{2}) is obtained by the following iterative algorithm:
An important feature of this algorithm is that it proceeds by splitting, in the sense that the functions f_{1} and f_{2} are dealt with in separate steps: in the first step, only function f_{2} is required to obtain ${\mathbf{z}}_{j,k}^{\left(o\right)}$ and, in the second step, only function f_{1} is involved to obtain ${\mathbf{t}}_{j,k+1}^{\left(o\right)}$. Furthermore, it can be seen that the algorithm requires to compute two proximity operators $\mathsf{\text{pro}}{\mathsf{\text{x}}}_{\gamma {f}_{1}}$, and $\mathsf{\text{pro}}{\mathsf{\text{x}}}_{\gamma {f}_{2}}$ at each iteration. One can find in [46] closedform expression of the proximity operator of various functions in Γ_{0}(ℝ). In our case, the proximity operator of γf_{1} is given by the softthresholding rule:
where ${\pi}_{j,k}^{\left(o\right)}\left(m,n\right)=\mathsf{\text{sof}}{\mathsf{\text{t}}}_{\left[\gamma ,\gamma \right]}\left({t}_{j,k}^{\left(o\right)}\left(m,n\right){x}_{i,j}\left(m,n\right)\right)+{x}_{i,j}\left(m,n\right)$ and
Concerning γf_{2}, it is easy to check that its proximity operator is expressed as:
where ${\mathbf{p}}_{j,k}^{\left(o\right)}={\left({\sum}_{m,n}{\stackrel{\u0303}{\mathbf{x}}}_{j}^{\left(o\right)}\left(m,n\right){\left({\stackrel{\u0303}{\mathbf{x}}}_{j}^{\left(o\right)}\left(m,n\right)\right)}^{\mathsf{\text{T}}}\right)}^{1}{\sum}_{m,n}{\stackrel{\u0303}{\mathbf{x}}}_{j}^{\left(o\right)}\left(m,n\right){t}_{j,j}^{\left(o\right)}\left(m,n\right).$.
Finally it is important to note that it has been shown (see [62] and references therein) that every sequence ${\left({\mathbf{z}}_{j,k}^{\left(o\right)}\right)}_{k\in \mathbb{N}}$ generated by the DouglasRachford algorithm (33) converges to a solution to problem (13) provided that the parameters γ and λ are fixed as indicated.
C The DouglasRachford algorithm in a product space
The solution of the problem (26) (which is the sum of the two functions f_{3} and f_{4}) is obtained by the following iterative algorithm:
Note that the above algorithm requires to compute the proximity operators of 2 new functions γf_{3} and γf_{4}. Concerning the proximity operator of γf_{3}, we have
Where
Concerning γf_{4}, its proximity operator is given by:
where
Endnotes
^{a}The ztransform of a signal x will be denoted in capital letters by X. ^{b}http://sipi.usc.edu/database. ^{c}http://vasc.ri.cmu.edu/idb/html/stereo/index.html, http://vasc.ri.cmu.edu/idb/html/jisct/index.html and http://cat.middlebury.edu/stereo/data.html.
References
 1.
Donoho DL, Johnstone IM: Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81(3):425455. 10.1093/biomet/81.3.425
 2.
Antonini M, Barlaud M, Mathieu P, Daubechies I: Image coding using wavelet transform. IEEE Trans Image Process 1992, 1(2):205220. 10.1109/83.136597
 3.
Woods JW: Subband Image Coding. Kluwer Academic Publishers, Norwell, MA, USA; 1990.
 4.
Mallat S: A Wavelet Tour of Signal Processing. Academic Press, San Diego; 1998.
 5.
Sweldens W: The lifting scheme: a customdesign construction of biorthogonal wavelets. Volume 3. Appl Comput Harmonic Anal; 1996:186200.
 6.
Arai K: Preliminary study on information lossy and lossless coding data compression for the archiving of ADEOS data. IEEE Trans Geosci Remote Sens 1990, 28: 732734. 10.1109/TGRS.1990.573001
 7.
Calderbank AR, Daubechies I, Sweldens W, Yeo BL: Wavelet transforms that map integers to integers. Appl Comput Harmonic Anal 1998, 5(3):332369. 10.1006/acha.1997.0238
 8.
Taubman D, Marcellin M: JPEG2000: Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, Norwell, MA, USA; 2001.
 9.
Gerek ON, Çetin AE: Adaptive polyphase subband decomposition structures for image compression. IEEE Trans Image Process 2000, 9(10):16491660. 10.1109/83.869176
 10.
Gouze A, Antonini M, Barlaud M, Macq B: Optimized lifting scheme for twodimensional quincunx sampling images. In IEEE International Conference on Image Processing. Volume 2. Thessaloniki, Greece; 2001:253258.
 11.
BenazzaBenyahia A, Pesquet JC, Hattay J, Masmoudi H: Blockbased adaptive vector lifting schemes for multichannel image coding. EURASIP Int J Image Video Process 2007, 10. (2007)
 12.
Heijmans H, Piella G, PesquetPopescu B: Building adaptive 2D wavelet decompositions by update lifting. In IEEE International Conference on Image Processing. Volume 1. Rochester, New York, USA; 2002:397400.
 13.
Chokchaitam S: A nonseparable twodimensional LWT for an image compression and its theoretical analysis. Thammasat Internat J Sci Technol 2004, 9: 3543.
 14.
Sun YK: A twodimensional lifting scheme of integer wavelet transform for lossless image compression. In International Conference on Image Processing. Volume 1. Singapore; 2004:497500.
 15.
Chappelier V, Guillemot C: Oriented wavelet transform for image compression and denoising. IEEE Trans Image Process 2006, 15(10):28922903.
 16.
Gerek ON, Çetin AE: A 2D orientationadaptive prediction filter in lifting structures for image coding. IEEE Trans Image Process 2006, 15: 106111.
 17.
Ding W, Wu F, Wu X, Li S, Li H: Adaptive directional liftingbased wavelet transform for image coding. IEEE Trans Image Process 2007, 10(2):416427.
 18.
Boulgouris NV, Strintzis MG: Reversible multiresolution image coding based on adaptive lifting. In IEEE International Conference on Image Processing. Volume 3. Kobe, Japan; 1999:546550.
 19.
Claypoole RL, Davis G, Sweldens W, Baraniuk RG: Nonlinear wavelet transforms for image coding. the 31st Asilomar Conference on Signals, Systems and Computers 1997, 1: 662667.
 20.
Gouze A, Antonini M, Barlaud M, Macq B: Design of signaladapted multidimensional lifting schemes for lossy coding. IEEE Trans Image Process 2004, 13(12):15891603. 10.1109/TIP.2004.837556
 21.
Solé J, Salembier P: Generalized lifting prediction optimization applied to lossless image compression. IEEE Signal Process Lett 2007, 14(10):695698.
 22.
Chang CL, Girod B: Direction Adaptive discrete wavelet transform for image compression. IEEE Trans Image Process 2007, 16(5):12891302.
 23.
Rolon JC, Salembier P: Generalized lifting for sparse image representation and coding. In Picture Coding Symposium. Lisbon, Portugal; 2007.
 24.
Liu Y, Ngan KN: Weighted adaptive liftingbased wavelet transform for image coding. IEEE Trans Image Process 2008, 17(4):500511.
 25.
Mallat S: Geometrical grouplets. Appl Comput Harmonic Anal 2009, 26(2):161180. 10.1016/j.acha.2008.03.004