# Layer-based sparse representation of multiview images

- Andriy Gelman
^{1}Email author, - Jesse Berent
^{2}and - Pier Luigi Dragotti
^{1}

**2012**:61

https://doi.org/10.1186/1687-6180-2012-61

© Gelman et al; licensee Springer. 2012

**Received: **14 July 2011

**Accepted: **9 March 2012

**Published: **9 March 2012

## Abstact

This article presents a novel method to obtain a sparse representation of multiview images. The method is based on the fact that multiview data is composed of epipolar-plane image lines which are highly redundant. We extend this principle to obtain the layer-based representation, which partitions a multiview image dataset into redundant regions (which we call layers) each related to a constant depth in the observed scene. The layers are extracted using a general segmentation framework which takes into account the camera setup and occlusion constraints. To obtain a sparse representation, the extracted layers are further decomposed using a multidimensional discrete wavelet transform (DWT), first across the view domain followed by a two-dimensional (2D) DWT applied to the image dimensions. We modify the viewpoint DWT to take into account occlusions and scene depth variations. Simulation results based on nonlinear approximation show that the sparsity of our representation is superior to the multi-dimensional DWT without disparity compensation. In addition we demonstrate that the constant depth model of the representation can be used to synthesise novel viewpoints for immersive viewing applications and also de-noise multiview images.

## Keywords

## 1 Introduction

The notion of sparsity, namely the idea that the essential information contained in a signal can be represented with a small number of significant components, is widespread in signal processing and data analysis in general. Sparse signal representations are at the heart of many successful signal processing applications, such as signal compression and de-noising. In the case of images, successful new representations have been developed on the assumption that the data is well modelled by smooth regions separated by edges or regular contours. Besides wavelets, which have been successful for image compression [1], other examples of dictionaries that provide sparse image representations are curvelets [2], contourlets [3], ridgelets [4], directionlets [5], bandlets [6, 7] and complex wavelets [8, 9]. We refer the reader to a recent overview article [10] for a more comprehensive review on the theory of sparse signal representation.

In parallel and somewhat independently to these developments, there has been a growing interest in the capture and processing of multiview images. The popularity of this approach has been driven by the advent of novel exciting applications such as immersive communication [11] or free-viewpoint and three-dimensional (3D) TV [12]. At the heart of these applications is the idea that a novel arbitrary photorealistic view of a real scene can be obtained by proper interpolation of existing views. The problem of synthesising a novel image from a set of multiview images is known as image-based rendering (IBR) [13].

The article is organised as follows. Next we review the structure of multiview data, discuss the layer-based representation and present a high-level overview of our proposed method. In Section 3 we present the layer extraction algorithm. The multi-dimensional DWT is discussed in Section 4. We finally evaluate the proposed sparse representation in Section 5 and conclude in Section 6.

## 2 Multiview data structure

We start by introducing the plenoptic function and the structure of multiview data. In addition we present a layer-based representation that exploits the multiview structure to partition the data into volumes each related to a constant depth in the scene.

### 2.1 Plenoptic function

*plenoptic function*[14]. Introduced by Adelson and Bergen, this function parameterises each light ray with a 3D point in space (

*V*

_{ x }

*, V*

_{ y }

*, V*

_{ z }) and its direction of arrival (

*θ,ϕ*). Two further variables

*λ*and

*t*are used to specify the wavelength and time, respectively. In total the plenoptic function is therefore seven dimensional:

where *I* corresponds to the light ray intensity.

In practise, however, it is not feasible to store, transmit or capture the 7D function. A number of simplifications are therefore applied to reduce its dimensionality. Firstly, it is common to drop the λ parameter and instead deal with either the monochromatic intensity or the red, green, blue (RGB) channels separately. Secondly, the light rays can be recorded at a specific moment in time, thus dropping the *t* parameter. This simplification can for example be applied when viewing a stationary scene. The resulting object is a 5D function.

*light field*[15] defines each light ray by its intersection with a camera plane and a focal plane:

*V*

_{ x }

*, V*

_{ y }) and (

*x, y*) correspond to the coordinates of the camera and the focal plane, respectively. Observe that the dataset can be analysed as a 2D array of images, where each image is formed by the light rays which pass through a specific point on the camera plane. In Figure 3 we illustrate an example of a light field with 16-camera locations. The camera positions are evenly spaced on a 2D grid (

*V*

_{ x }

*, V*

_{ y }).

In comparison to the light field, the EPI is easier to visualise and in the following sections we use it to present a number of concepts. All of the properties are however easily generalised to the light field. Next, we review the EPI and light field structure and present the layer-based representation.

### 2.2 EPI and light field structure

*EPI line*.

In order to demonstrate why the fundamental component of multiview images are EPI lines, consider the setup in Figure 4a. Here we show a simplified version of the scene: the horizontal axis corresponds to the camera location line; the line parallel to it defines the focal plane of each camera^{a}; and the vertical axis defines the depth of the scene. The curved line thus corresponds to the surface of the object.

*X, Y, Z*). Assuming a Lam-bertian scene

^{b}this point will appear in each of the images with coordinates

where *f* is the focal length. As illustrated in Figure 4b, *x* is linearly related to the camera location *V*_{
x
}. The rate of change in the pixel location, also known as the disparity $\Delta p=\frac{f}{Z}$, is inversely related to the depth of the object. Thus, a point in space maps to a line in the EPI volume.

However, the above analysis does not take into account occlusions. Clearly when two lines intersect, the EPI line corresponding to a smaller depth (larger disparity) will occlude all the EPI lines which are related to a larger depth (smaller disparity) in the scene. This principle is illustrated in Figure 4c.

*V*

_{ x }

*, V*

_{ y }). In this case, a point (

*X, Y, Z*) maps onto a 2D plane as

### 2.3 Layer-based representation

The layer-based representation is an extension of the EPI line concept. The representation partitions the multiview data into homogenous regions, where each layer is a collection of EPI lines modelled by a constant depth plane. An example of a layer-based representation is shown in Figure 1.

*p*

_{ k }as shown in Figure 5a. We define the layer carved out by the EPI lines with ${\mathscr{H}}_{k}$ and the boundary which delimits the region with Γ

_{ k }Assuming there are no occlusions, observe that using (4) and (5), the boundary Γ

_{ k }can be defined by a contour on one of the viewpoints projected to the remaining frames. More specifically, if we define the contour

*γ*

_{ k }(

*s*) = [

*x*(

*s*),

*y*(

*s*)] to be the boundary on the viewpoint (

*V*

_{ x }= 0), we obtain the relationship

where *s* parameterises the contour *γ*_{
k
} (*s*).

^{c}${\mathscr{H}}_{k-1}^{\mathcal{V}}$ and ${\mathscr{H}}_{k}^{\mathcal{V}}$. In this example the layers are ordered in terms of increasing depth (i.e., ${\mathscr{H}}_{k}$ corresponds to a larger depth than ${\mathscr{H}}_{k-1}$). In general, the visible regions of a layer can be defined as

There are a number of advantages to segmenting a multiview dataset into layers. Firstly, each layer is highly redundant in the direction of the disparity Δ*p*. This is due to the fact that each layer consists of EPI lines modelled by a constant depth. Secondly, any occluded regions are explicitly defined by the representation. These regions correspond to artificial boundaries, and their specific locations can be used to design a transform which takes them into account. Thirdly, the boundary of each layer can be efficiently defined by a contour on one viewpoint *γ* (*s*) and its disparity Δ*p*. This is important if a compression algorithm based on the sparse representation is to be implemented, where the segmentation of each layer must also be transmitted.

### 2.4 Sparse representation method high-level overview

In the following step we decompose the layers using a 4D DWT applied in a separable fashion across the viewpoint and the spatial dimensions. We modify the viewpoint transform to include disparity compensation and also efficiently deal with disoccluded regions. Additionally, the transform is implemented using the lifting scheme [17] to reduce the complexity and maintain invertibility.

In the following sections we describe the layer extraction and 4D DWT stages in more detail.

## 3 Layer-based segmentation

Data segmentation is the first stage of the proposed method. Here we introduce our segmentation algorithm which achieves accurate results by taking into account the structure of multiview data. We introduce the method by first describing a general segmentation problem and then showing how that solution can be adapted to extract layers from a light field dataset.

### 3.1 General region-based data segmentation

*m*-dimensional dataset $\mathcal{D}\subset {\mathbb{R}}^{m}$ into subsets $\mathscr{H}$ and $\stackrel{\u0304}{\mathscr{H}}$ where the boundary which delimits the two regions is defined by Γ (σ) with σ ε ℝ

^{m-1}. This type of problem can be solved using an optimisation framework, where the boundary is obtained by minimising an objective function

*J*:

where the descriptor *d* (·) measures the homogeneity of each region and **x** ε ℝ^{
m
}. The descriptor can be designed such that when **x** belongs to the region $\mathscr{H}$, *d*(**x**,$\mathscr{H}$) tends to zero and vice versa. Note also that (10) has an additional regularisation term *η*, which acts to minimise the length of the boundary.

*τ*. Consider modeling the boundary using a partial differential equation (PDE), also known as an active contour [19]:

**v**is a velocity vector, which can be expressed in terms of a scalar force

*F*acting in the outward normal direction

**n**to the boundary. The velocity vector

**v**can be evaluated in terms of the descriptor

*d*(·) by differentiating (10) with respect to

*τ*. Applying the Eulerian framework [18], the derivative can be expressed in terms of boundary integrals:

where *κ* is the curvature of the boundary Γ and *·* denotes the dot product. Observe that v and n correspond to the velocity and the normal vectors in (11), respectively.

The above framework is also known as 'competition-based' segmentation. This is clear from (13), where a point on the boundary will experience a positive force if it belongs to the region $\mathscr{H}$ and vice versa, hence evolving the contour in the correct direction. In conclusion, the general segmentation problem can be solved by modeling the boundary Γ as a PDE and evolving the contour in the direction of the velocity vector **v**.

### 3.2 Multiview image segmentation

In the case of a light field, the goal is to extract *N* layers, where each volume is modelled by a constant depth *Z*_{
k
} or the associated disparity Δ*p*_{
k
}. In the context of the previous section, this is equivalent to segmenting the data into 4D layers {${\mathscr{H}}_{1},...,{\mathscr{H}}_{N}$}, where the boundary of each layer is defined by {Γ_{1},..., Γ_{
N
}} (the background volume ${\mathscr{H}}_{N}$ is assigned the residual regions which do not belong to any other layer).

^{d}in terms of the visible regions ${\mathscr{H}}_{k}^{\mathcal{V}}$, and this leads to the following:

where x = [*x, y, V*_{
x
}*, V*_{
y
}] ^{
T
}and ${\mathscr{H}}_{k}^{\mathcal{V}}$ correspond to the visible regions of each layer.

*d*

_{ k }(

**x**, Δ

*p*

_{ k })

where *μ* (**x**, Δ*p*_{
k
}) is the mean of the EPI line which passes through a point **x** and has a disparity Δ*p*_{
k
}.

The aim of the segmentation is then to obtain the layer boundaries Γ_{
k
}and the disparity values Δ*p*_{
k
} for *k* = 1,*..., N* by minimising (14). Observe however that (14) has a large number of unknown variables. In order to minimise the function, we consider the problem of layer evolution and disparity estimation separately and then show how the problem is iteratively solved in Section 3.2.4.

where ${d}_{1}^{\text{out}}\left(\mathbf{x}\right)={d}_{i}\left(\mathbf{x},\Delta {p}_{i}\right)$ when $\mathbf{x}\in {\mathscr{H}}_{i}^{\mathcal{V}}$ for *i* = 2, 3.

*k*-th layer, the cost function can be simplified to

A possible solution would then be to evaluate the 4D velocity vector of the boundary corresponding to ${\mathscr{H}}_{k}^{\mathcal{V}}$ This approach, however, would not explicitly take into account the structure of multiview data in the minimisation. In the following we show how (19) is solved by imposing the camera setup and the occlusion constraints.

#### 3.2.1 Imposing camera setup and occlusion constraints

*k*-th layer, we model this by using the following indicator function:

_{ k }. This, therefore, allows the derivative of the cost to be defined as

_{ k }can be parameterised by a 2D contour γ

_{ k }(

*s*) = [

*x*(

*s*),

*y*(

*s*)] on the image viewpoint (

*V*

_{ x }= 0). Substituting this into (22) we obtain

**v**

_{γk}and

**n**

_{γk}now correspond to the velocity and the outward normal vector of the 2D boundary

^{e}, respectively. In addition, the new objective functions

*D*

_{ k }(

*⋅*) and ${D}_{k}^{\text{out}}\left(\cdot \right)$ are defined as

Note that the new descriptors ${D}_{k}^{\text{out}}\left(\cdot \right)$ and *D*_{
k
} (*·*) are simply the descriptors ${d}_{k}^{\text{out}}\left(\cdot \right)$ and *d*_{
k
} (*·*) integrated over the viewpoint dimensions.

^{f}

_{ k }(

*s*) towards the desired segmentation for each layer.

#### 3.2.2 Disparity and number of layers estimation

_{l}

*...,γ*

_{ N }} are constant. In this case, the objective function can be simplified to:

In contrast to the optimisation of the layer contours, this problem is significantly simpler. A solution can be obtained in an iterative approach by estimating the disparity of each layer assuming the remaining disparities are constant. Each parameter is then estimated using the MATLAB nonlinear optimisation toolbox.

In addition, observe that we require the knowledge of the number of layers *N*. In our approach we initialise this value using a stereo match algorithm [20]. Alternatively, one could estimate the number of layers using the spectral properties of the light field [21] as proposed in [22].

#### 3.2.3 Level-set method for the boundary evolution

We have demonstrated an approach to derive the velocity vector for each boundary. We then implement the evolution of the active contours using the level-set method [23].

*z*=

*ϕ*(

*x, y, τ*). The original boundary is then defined as the zero-level of the new function

where *s* parameterises the (*x, y*) coordinates on the 2D boundary.

*ϕ*(

*γ*(

*s, τ*),

*τ*) = 0 with respect to

*τ*we obtain

*ϕ*(

*x, y, τ*) evaluated on the boundary

*γ*(

*s, τ*) corresponds to the outward normal vector of the boundary

**n**

_{ γ }. This implies that

There are two main advantages to using the level-set method. First, the surface implicitly models any topological changes of the boundary. Second, unlike other parameterisation schemes, the approach does not suffer instability issues since (31) is evaluated on a fixed cartesian grid.

The evolution of the level-set method does however have a drawback in terms of increased complexity. To evolve the surface, the velocity vector must be evaluated at every position on the grid. In our approach, we deal with this problem by using the narrowband implementation [24], where only a region around the boundary is evolved instead of the complete surface.

#### 3.2.4 Layer segmentation algorithm overview

An overview of the complete layer extraction algorithm is shown in Algorithm 1. First, the 2D contours and the disparity of each layer are initialised using a stereo matching algorithm [20]. The algorithm evaluates the disparity of each layer and then iteratively evolves the boundaries^{g} using the proposed velocity vector in (26). This process continues for a certain number of iterations or until the change in the overall cost is below a predefined threshold.

To obtain a sparse representation, the obtained layers are decomposed using a 4D DWT as explained in the following section.

## 4 Data decomposition

In this stage, the redundancy of the texture in each layer shown in Figure 1 is reduced using a multi-dimensional wavelet transform. In the following, we present the inter-view and the spatial transforms in more detail.

**Algorithm 1 Layer extraction algorithm**

**STEP 1**: Initialise the 2D boundary of each layer {γ_{1}, γ_{2},..., γ_{
N
}} using a stereo matching algorithm (Algorithm [20] in our implementation).

**STEP 2**: Estimate the disparity of each layer {Δ *p*_{1}, Δ*p*_{2},... Δ*p*_{
N
}} by minimising the squared error along the EPI lines.

**STEP 3**: Reorder the layers in terms of increasing depth.

**STEP 4**: Iteratively evolve the layer boundaries assuming the remaining layers are constant:

**for** *k* = 1 to N-1 **do**

Evaluate the velocity vector **v**_{γk}of the *k*-th layer.

Evolve the boundary γ_{
k
}according to the velocity vector.

**end for**

**STEP 5**: Return to **STEP 2** or exit algorithm if the change in the cost (14) is below a predefined threshold.

### 4.1 Inter-view 2D DWT

*V*

_{ y }) followed by the column images (

*V*

_{ x }) as illustrated in Figure 11. The process is iterated on the low-pass components to obtain a multiresolution decomposition.

where, *P*_{
o
} [*n*] and *P*_{
e
} [*n*] represent 2D images with spatial coordinates (*x, y*) located at odd (2*n* + 1) and even (2*n*) camera locations, respectively. Following (32), ${\mathcal{L}}_{e}\left[n\right]$ contains the 2D low-pass subband and ${\mathcal{L}}_{o}\left[n\right]$ the high-pass subband. Assuming that $\mathcal{W}$ is invertible and the images are spatially continuous, the above transform can be shown to be equivalent to the standard DWT applied along the motion trajectories [25].

*n*

_{1}to

*n*

_{2}along the

*V*

_{ x }dimension as:

where Δ*p* is the layer disparity.

and the high pass coefficient in ${\mathcal{L}}_{o}\left[n\right]$ is set to zero. In (34), the warping operator $\hat{\mathcal{W}}$ is set to an integer pixel precision to ensure invertibility and is set to be the *ceiling* of the disparity in (33).

### 4.2 Spatial shape-adaptive 2D DWT

Note that the overlapped pixels are commonly bounded by an irregular (non-rectangular) shape. For that reason, the standard 2D DWT applied to the entire spatial domain is inefficient due to the boundary effect. We therefore use the shape-adaptive DWT [26] within arbitrarily shaped objects. The method reduces the magnitude of the high pass coefficients by symmetrically extending the texture whenever the wavelet filter is crossing the boundary. The 2D DWT is built as a separable transform with linear-phase symmetric wavelet filters (9/7 or 5/3 [27]), which, together with the symmetric signal extensions, leads to critically sampled transform subbands.

## 5 Evaluation

In this section we evaluate the performance of the proposed sparse representation using its nonlinear approximation properties. In addition, we demonstrate de-noising and IBR applications based on the proposed decomposition.

### 5.1 Nonlinear approximation

We evaluate the sparseness of the representation using its *N*-term nonlinear approximation. To implement the nonlinear approximation, we keep the *N* largest coefficients in the transform domain, reconstruct the data and evaluate the data fidelity in terms of PSNR.

^{h}. We demonstrate this in Figure 13 on three datasets: Tsukuba light field [272 × 368 × 4 × 4], Teddy EPI [368 × 352 × 4] and Doll EPI [368 × 352 × 4] (all from [28]), which vary in terms of scene complexity, number of images and spatial resolution. We show that in each case our approach achieves a sparser representation across the complete range of retained coefficients, with PSNR gains of up to 7 dB on the Tsukuba light field. The Tsukuba light field has a larger PSNR improvement than the respective Teddy and Doll EPI volumes due to the additional viewing dimension. This means that there exists more redundant information and this is fully exploited by our representation. We also show that the PSNR curves correspond to a subjective improvement in Figure 14.

We note that the nonlinear approximation metric is also a good indicator of the compression capability of the representation. In practice, the issue of compression is more complicated due to the additional problem of encoding the locations of the significant coefficients and also to the rate allocation. These issues are beyond the scope of this paper, however, we refer the reader to [29] where these problems are addressed and a complete multiview image compression method is presented.

### 5.2 De-noising

Here we demonstrate de-noising results based on the proposed sparse representation in the presence of additive white Gaussian noise (AWGN). We implement the de-noising by soft thresholding the wavelet coefficients in each subband. For each subband, the threshold is chosen by minimising the Stein's Unbiased Risk Estimate (SURE) of the mean squared error (MSE) [30].

### 5.3 Image-based rendering

In this section we present viewpoint interpolation results based on the layer-based representation shown in Figure 1.

To render an image at an arbitrary viewpoint, we linearly interpolate the closest available images. Recall that the data pixels are highly correlated in the direction of the disparity. We take this into account by modifying the support of the rendering kernel according to the disparity of each layer. Additionally, we modify the interpolation in the presence of occlusions to further improve the results. In this case, only pixels that belong to the layer are used in the rendering process.

In order to obtain an objective evaluation, we use the leave-one-out approach. In this case the images located at odd camera viewpoint locations are removed and synthesised using the scene model^{i}.

We compare our results to a state-of-the-art stereo matching algorithm [32] and an EPI tubes extraction method [33]. These methods specify the structure of the EPI lines, and the interpolation is implemented using the same approach as in the proposed algorithm.

## 6 Conclusion

We presented a novel method to obtain a sparse representation of multiview images. The fundamental component of the algorithm is the layer-based representation, which partitions the multiview images into a set of layers each related to a constant depth in the scene. We presented a novel method to obtain the layer-based representation using a general segmentation framework which takes into account the structure of multiview data to achieve accurate results. The obtained layers are then decomposed using a 4D DWT applied in a separable approach, first across the camera viewpoint and then the image dimensions. We modify the viewpoint transform to efficiently deal with occlusions and depth variations. Simulation results based on nonlinear approximation have shown that the sparsity of our representation is superior to a multi-dimensional DWT with the same decomposition structure without disparity compensation. In addition, we have shown that the proposed representation can be used to efficiently synthesise novel viewpoints for IBR applications and also de-noise multiview images in the presence of AWGN.

## Endnotes

^{a}Each camera in the setup is modelled by the pinhole model [35]. ^{b}Light ray intensity is constant when an object is observed from a different angle. ^{c}By visible regions we mean the EPI line segments which are present in the EPI volume. ^{d}We have not included the regularisation terms for the sake of clarity. ^{e}It can be shown that given a fronto-parallel depth plane, the inner product of **v**_{γk}· n_{γk}is equal to **v**_{Γk}· n_{Γk}. ^{f}Note that in practise we also include a regularisation term to constrain the evolution according to the curvature of the boundary. ^{g}Note that the background layer ${\mathscr{H}}_{N}^{\mathcal{V}}$ is automatically assigned all of the regions which do not belong to the remaining layers and is therefore not evolved. ^{h}This multi-dimensional DWT has the same decomposition structure as our method, however no disparity compensation. iThe extracted layers are obtained using the dataset with the removed images.

## Declarations

### Acknowledgements

We would like to thank the anonymous reviewers whose constructive comments led to an improved manuscript.

## Authors’ Affiliations

## References

- Taubman D, Marcellin M:
*JPEG2000 Image Compression Fundamentals, Standards and Practice*. Kluwer Academic Publishers, Boston; 2004.Google Scholar - Candés EJ, Donoho DL:
*Curvelets: a surprisingly effective nonadaptive representation of objects with edges*. Curve and Surface Fitting, University Press, Saint-Malo; 2000.Google Scholar - Do MN, Vetterli M: The contourlet transform: an efficient directional multiresolution image representation.
*IEEE Trans Image Process*2005, 14(12):2091-2106.MathSciNetView ArticleGoogle Scholar - Do MN, Vetterli M: The finite ridgelet transform for image representation.
*IEEE Trans Image Process*2003, 12(1):16-28. 10.1109/TIP.2002.806252MathSciNetView ArticleGoogle Scholar - Velisavljevic V, Beferull-Lozano B, Vetterli M, Dragotti PL: Directionlets: anisotropic multidirectional representation with separable filtering.
*IEEE Trans Image Process*2006, 5(7):1916-1933.View ArticleGoogle Scholar - Pennec E Le, Mallat S: Sparse geometric image representations with bandelets.
*IEEE Trans Image Process*2005, 14(4):423-438.MathSciNetView ArticleGoogle Scholar - Pennec E Le, Mallat S: Bandelet image approximation and compression.
*SIAM J Multiscale Model Simul*2005, 4(3):992-1039. 10.1137/040619454View ArticleGoogle Scholar - Bayram I, Selesnick IW: On the dual-tree complex wavelet packet and
*M*-band transforms.*IEEE Trans Signal Process*2008, 56(6):2298-2310.MathSciNetView ArticleGoogle Scholar - Selesnick IW, Baraniuk RG, Kingsbury NG: The dual-tree complex wavelet transform.
*IEEE Signal Process Mag*2005, 22(6):123-151.View ArticleGoogle Scholar - Bruckstein AM, Donoho DL, Elad M: From sparse solutions of systems of equations to sparse modeling of signals and images.
*SIAM Rev*2009, 51: 34-81. 10.1137/060657704MathSciNetView ArticleGoogle Scholar - Do MN, Nguyen QH, Nguyen HT, Kubacki D, Patel SJ: Immersive visual communication.
*IEEE Signal Process Mag*2011, 28(1):58-66.View ArticleGoogle Scholar - Kubota A, Smolic A, Magnor M, Tanimoto M, Chen T, Zhang C: Multiview imaging and 3DTV.
*IEEE Signal Process Mag*2007, 24(6):10-21.View ArticleGoogle Scholar - Zhang C, Chen T: A survey on image-based rendering-representation, sampling and compression.
*Signal Process Image Commun*2004, 19: 1-28. 10.1016/j.image.2003.07.001View ArticleGoogle Scholar - Adelson EH, Bergen JR:
*The Plenoptic Function and the Elements of Early Vision Computational Models of Visual Processing*. MIT Press, Cambrige; 1991:3-20.Google Scholar - Levoy M, Hanrahan P: Light field rendering. In
*Proceedings of Computer Graphics (SIGGRAPH)*. New Orleans, Louisiana; 1996:31-42.Google Scholar - Bolles R, Baker H, Marimont D: Epipolar-plane image analysis: an approach to determining structure from motion.
*Int J Comput Vis*1987, 1(1):7-55. 10.1007/BF00128525View ArticleGoogle Scholar - Daubechies I, Sweldens W: Factoring wavelet transforms into lifting steps.
*J Fourier Anal Appl*1998, 4(3):247-269. 10.1007/BF02476026MathSciNetView ArticleGoogle Scholar - Jehan-Besson S, Barlaud M, Aubert G: DREAM2S: Deformable regions driven by an eulerian accurate minimization method for image and video segmentation.
*Int J Comput Vis*2003, 53: 45-70. 10.1023/A:1023031708305View ArticleGoogle Scholar - Kass M, Witkin A, Terzopoulos D: Snakes: Active contour models.
*Int J Comput Vis*1988, 1(4):321-331. 10.1007/BF00133570View ArticleGoogle Scholar - Kolmogorov V, Zabih R: Multi-camera scene reconstruction via graph cuts. In
*Proceedings of the 7th European Conference on Computer Vision-Part III (ECCV)*. Springer-Verlag, Copenhagen, Denmark; 2002:82-96.Google Scholar - Chai JX, Tong X, Chan SC, Shum HY: Plenoptic sampling. In
*Proceedings of Computer Graphics (SIGGRAPH)*. ACM Press/Addison-Wesley Publishing Co., New York; 2000:307-318.Google Scholar - Berent J, Dragotti PL, Brookes M: Adaptive layer extraction for image based rendering. In
*Proceedings of IEEE Workshop on Multimedia Signal Processing (MMSP)*. Rio De Janeiro, Brazil; 2009:266-271.Google Scholar - Sethian JA:
*Level Set Methods*. Cambridge University Press, Cambridge; 1996.Google Scholar - Hötter M: Object-oriented analysis-synthesis coding based on moving two-dimensional objects.
*Signal Process Image Commun*1990, 2(4):409-428. 10.1016/0923-5965(90)90027-FView ArticleGoogle Scholar - Secker A, Taubman D: Lifting-based invertible motion adaptive transform (LIMAT) framework for highly scalable video compression.
*IEEE Trans Image Process*2003, 12(12):1530-1542. 10.1109/TIP.2003.819433View ArticleGoogle Scholar - Li S, Li W: Shape-adaptive discrete wavelet transforms for arbitrarily shaped visual object coding.
*IEEE Trans Circ Syst Video Technol*2000, 10(5):725-743. 10.1109/76.856450View ArticleGoogle Scholar - Unser M, Blu T: Mathematical properties of the JPEG2000 wavelet filters.
*IEEE Trans Image Process*2003, 12(9):1080-1090. 10.1109/TIP.2003.812329MathSciNetView ArticleGoogle Scholar - Scharstein D, Szeliski R:
*Middlebury datasets*. [http://www.vision.middlebury.edu/stereo/data] - Gelman A, Dragotti PL, Velisavljevi'c V: Multiview image compression using a layer-based representation. In
*Proceedings of the IEEE International Conference on Image Processing (ICIP)*. Hong Kong, China; 2010:13-16.Google Scholar - Donoho D, Johnstone IM: Adapting to unknown smoothness via wavelet shrinkage.
*J Am Stat Assoc*1995, 90: 1200-1224. 10.2307/2291512MathSciNetView ArticleGoogle Scholar - Blu T, Luisier F: The SURE-LET approach to image denoising.
*IEEE Trans Image Process*2007, 16(11):2778-2786.MathSciNetView ArticleGoogle Scholar - Ogale AS, Aloimonos Y: Shape and the stereo correspondence problem.
*Int J Comput Vis*2005, 65(3):147-162. 10.1007/s11263-005-3672-3View ArticleGoogle Scholar - Criminisi A, Kang SB, Swaminathan R, Szeliski R, Anandan P:
*Extracting layers and analyzing their specular properties using epipolar-plane-image analysis*. Microsoft Research Technical Report MSR-TR-2002-19; 2002.Google Scholar - Berent J:
*Coherent multi-dimensional segmentation of multi-view images using a vari-ational framework and applications to image based rendering*. PhD Thesis, Imperial College; 2008. [http://www.commsp.ee.ic.ac.uk/~pld/group/Thesis_Berent_08.pdf]Google Scholar - Hartley RI, Zisserman A:
*Multiple View Geometry in Computer Vision*. 2nd edition. Cambridge University Press, Cambridge; 2004. ISBN:0521540518View ArticleGoogle Scholar - Shum HY, Kang SB: A review of image-based rendering techniques.
*IEEE SPIE Vis Commun Image Process (VCIP)*2000, 213: 1-12.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.