Methods for depthmap filtering in viewplusdepth 3D video representation
 Sergey Smirnov^{1},
 Atanas Gotchev^{1}Email author and
 Karen Egiazarian^{1}
https://doi.org/10.1186/16876180201225
© Smirnov et al; licensee Springer. 2012
Received: 6 June 2011
Accepted: 14 February 2012
Published: 14 February 2012
Abstract
Viewplusdepth is a scene representation format where each pixel of a color image or video frame is augmented by perpixel depth represented as grayscale image (map). In the representation, the quality of the depth map plays a crucial role as it determines the quality of the rendered views. Among the artifacts in the received depth map, the compression artifacts are usually most pronounced and considered most annoying. In this article, we study the problem of postprocessing of depth maps degraded by improper estimation or by blocktransformbased compression. A number of postfiltering methods are studied, modified and compared for their applicability to the task of depth map restoration and postfiltering. The methods range from simple and trivial Gaussian smoothing, to inloop deblocking filter standardized in H.264 video coding standard, to more comprehensive methods which utilize structural and color information from the accompanying color image frame. The latter group contains our modification of the powerful local polynomial approximation, the popular bilateral filter, and an extension of it, originally suggested for depth superresolution. We further modify this latter approach by developing an efficient implementation of it. We present experimental results demonstrating highquality filtered depth maps and offering practitioners options for highestquality or better efficiency.
1 Introduction
Another advantage of the representation is its backward compatibility with conventional singleview broadcasting formats. In particular, MPEG2 transport stream standard used in DVB broadcasting allows transmitting auxiliary streams along with main video, which makes possible to enrich a conventional digital video transmission with depth information without hampering the compatibility with singleview receivers.
The major disadvantages of the format are the appearance of disoccluded areas in rendered views and inability to properly represent most of the semitransparent objects such as fog, smoke, glassobjects, thin fabrics, etc. The problems with occlusions are caused by the lack of information about what is behind a foreground object, when a newperspective scene is synthesized. Such problems are tackled by occlusion filling [4] or by extending the format to multiview multidepth, or to layered depth [3].
Quality is an important factor for the successful utilization of depth information. Depth map degraded by strong blocky artifacts usually produces visually unacceptable rendered views. For successive 3D video transmission, efficient depth postfiltering technique should be considered.
Filtering of depth maps has been addressed mainly from the point of view of increasing the resolution [5–7]. In [6], a joint bilateral filtering has been suggested to upsample lowresolution depth maps. The approach has been further refined in [7] by suggesting proper antialiasing and complexityefficient filters. In [5], a probabilistic framework has been suggested. For each pixel of the targeted highresolution grid, several depth hypothesizes are built and the hypothesis with lowest cost is selected as a refined depth value. The procedure is run iteratively and bilateral filtering is employed at each iteration to refine the cost function used for comparing the depth hypotheses.
In this article, we study the problem of postprocessing of depth maps degraded by improper estimation or by blocktransformbased compression. A number of postfiltering methods are studied, modified, and compared for their applicability to the task of depth map restoration and postfiltering. We consider methods ranging from simple and trivial smoothing and deblocking methods to more comprehensive methods which utilize structural and color information from the accompanying color image frame. The present study is an extension of the study reported in [8]. Some of the methods included in the comparative analysis in [8] have been further modified and for one of them, a more efficient implementation has been proposed. We present extended experimental results which allow evaluating the advantages and limitations of each method and give practitioners options for tradingoff between highest quality and better efficiency.
2 Depth map characteristics
2.1 Properties of depth maps
Depth map is grayscale image which encodes the distance to the given scene pixels for a certain perspective. The depth is usually aligned with and accompanies the color view of the same scene [9].
Single view plus depth is usually a more efficient representation of a 3D scene than twochannel stereo. It directly encodes geometrical information contained otherwise in the disparity between the two views thus providing scalability and possibility to render multiple views for displays with different sizes [1]. Structurewise, the depth image is piecewise smooth (as representing gradual change of depth within objects) with delineated, sharp discontinuities at object boundaries. Normally, it contains no textures. This structure should be taken into account when designing compression or filtering algorithms.
Having a depth map given explicitly along with color texture, a virtual view for a desired camera position can be synthesized using DIBR [2]. The given depth map is first inverselytransformed to provide the absolute distance and hence the world 3D coordinates of the scene points. These points are projected then onto a virtual camera plane to obtain a synthesized view. The technique can encounter problems with disoccluded pixels, noninteger pixel shifts, and partly absent background textures, which problems have to be addressed in order to successfully apply it [1].
The quality of the depth image is a key factor for successful rendering of virtual views. Distortions in the depth channel may generate wrong objects contours or shapes in the rendered images (see, for example, Figure 1d,e) and consequently hamper the visual user experience manifested in headache and eyestrain, caused by wrong contours of familiar objects. At the capture stage, depth maps might be not well aligned with the corresponding objects. Holes and wrongly estimated depth points (outliers) might also exist. At the compression stage, depth maps might suffer from blocky artifacts if compressed by contemporary methods such as H.264 [10]. When accompanying video sequences, the consistency of successive depth maps in the sequence is an issue. Timeinconsistent depth sequences might cause flickering in the synthesized views as well as other 3Dspecific artifacts [11].
At the capture stage, depth can be precisely estimated in floatingpoint hight resolution, however, for compression and transmission it is usually converted to integer values (e.g., in 256 grayscale gradations). Therefore, the depth range and resolution have to be properly maintained by suitable scaling, shifting, and quantizing, where all these transformations have to be invertible.
Depth quantization is normally done in linear or logarithmic scale. The latter approach allows better preservation of geometry details for closer objects, while higher geometry degradation is tolerated for objects at longer distances. This effect corresponds to the parallaxbased human stereovision, where the binocular depth cue losses its importance for more distanced objects and is more important and dominant for closer objects. The same property can be achieved if transmitting linearly quantized inverse depth maps. This type of depth representation basically corresponds to binocular disparity (also known as horizontal parallax), including again necessary modifications, such as scaling, shifting, and quantizing.
2.2 Depth map filtering problem formulation
This section formally formulates the problem of filtering of depth maps and specifies the notations used hereafter. Consider an individual color video frame in YUV (YCbCr) or RGB color space y(x) = [y^{ Y }(x), y^{ U }(x), y^{ V }(x)] or y(x) = [y^{ R }(x),y^{ G }(x),y^{ B }(x)], together with the associated perpixel depth z(x), where x= [x_{1}, x_{2}] is a spatial variable, x∈ X, X being the image domain.
A new, virtual view η(x) = [η^{ Y }(x), η^{ U }(x), η^{ V }(x)] can be synthesized out of the given (reference) color frame and depth by DIBR, applying projective geometry and knowledge about the reference view camera, as discussed in Section 2.1 [2]. The synthesized view is composed of two parts, η= η_{ v }+ η_{ o }, where η_{ v }denotes the visible pixels from the position of the virtual view camera and η_{o} denotes the pixels of occluded areas. The corresponding domains are denoted by X_{ v }and X_{ o }correspondingly, X_{ v }⊂ X, X_{ o }= X\X_{ v }.
where C = Y, U, V or R, G, B. Both degradations are modeled as independent white Gaussian processes: ${\epsilon}^{C}\left(\cdot \right)~N\left(0,{\sigma}_{C}^{2}\right),\epsilon \left(\cdot \right)~N\left(0,{\sigma}^{2}\right)$. Note that the variance of color signal noise (${\sigma}_{C}^{2}$) differs from the one of the depth signal noise (σ^{2}).
If degraded depth and reference view are used in DIBR, the result will be a lowerquality synthesized view $\stackrel{\u2323}{\eta}$. Unnatural discontinuities, e.g., blocking artifacts, in the degraded depth image cause geometrical distortions and distorted object boundaries in the rendered view. The goal of the filtering of degraded depth maps is to mitigate the degradation effects (caused by e.g., quantization or imperfect depth estimation) in the depth image domain, i.e., to obtain a refined depth image estimate $\u1e91$, which would be closer to the original, errorfree depth, and would improve the quality of the rendered view.
2.3 Depth map quality measures
Measuring the quality of depth maps has to take into account that depth maps are type of imagery which are not visualized perse, but through rendered views.
In our study, we consider two types of measures:

measures based on comparison between processed and ground truth (reference) depth;

measures based on comparison between virtual views rendered from processed depth and from ground truth one.
Measures for the first group have the advantage of being simple, while measures from the second group are closer to subjective perception of depth. For both of these groups we suggest and test new measures.
PSNR of Restored Depth
where z(x) and $\u1e91\left(x\right)$ are the reference and processed signals; N is number of samples (pixels) and MAX_{ z }is the maximal possible pixel value, assuming the minimal one is zero. In this metric higher value means better quality. Applying PSNR to depth images must be done with care and with proper rescaling, as most of depth maps have a subrange of the usual 8bit range of 0 to 255 and PSNR might turn to be unexpectedly high.
PSNR of rendered view
PSNR is calculated to compare the quality of rendered view using processed depth versus that of using original depth [10]. It essentially measures how close the rendered view is to the 'ideal' one. In our calculations, pixels, disoccluded during the rendering process, are excluded so to make the comparison independent on the particular hole fitting approach. For color images, we calculate PSNR independently for each color channel and then calculate the mean between three channels.
Percentage of bad pixels
Depth consistency
Analysing the BAD metric, one can notice that the thresholding imposed there, does not emphasize the importance of small or big differences. It is equally important, when the error is just a quantum above the threshold and when it is quite high.
In a case of depth degraded by compression artifacts, almost all pixels are quantized thus changing their original values and therefore causing the BAD metric to show very low quality while the quality of the rendered views will not be that bad.
Starting from the idea that the perceptual quality of rendered view will depend more on the amount of geometrical distortions than on the number of bad depth pixels, we suggest to give preference to areas where the change between ground truth depth and compressed depth is more abrupt. Such changes are expected to cause perceptually high geometrical distortions.
GradientNormalized RMSE
As suggested in [13], the performance of optical flow estimation algorithms can be evaluated using gradientnormalized RMSE metric. Such measure decreases the overpenalization of errors caused by fine textures.
where η^{ Y }(x) is the luminance of the virtual image generated by ground truth depth and ${\widehat{\eta}}^{Y}\left(x\right)$ is the luminance of virtual image generated by processed depth. For better quality, the metric shows low values.
Discontinuity Falses
where #(X) is cardinality (number of elements) of a domain X. The measure decreases with improving the quality of the processed depth.
3 Depth filtering approaches
A number of postprocessing approaches for restoration of natural images exist [14]. However, they are not directly applicable to range images due to differences in image structure.
In this section, we consider several existing filtering approaches and modify them for our need. First group of approaches works on the depth map images with using no structural information from the available color channel. Gaussian smoothing and H.264 inloop deblocking filter [15] are the filtering approaches included in this group. The approaches of the second group actively use available color frame to improve depth map quality. While there is an apparent correlation between the color channel and the accompanying depth map, it is important to characterize which color and structure information can help for depth processing.
More specifically, we optimize stateoftheart filtering approaches, such as local polynomial approximation (LPA) [16] and bilateral filtering [17] to utilize edgepreserving structural information from the color channel for refining the blocky depth maps. We suggest a new version of the LPA approach which, according to our experiments, is most appropriate for depth map filtering. In addition, we suggest an accelerated implementation of the method based on hypothesis filtering as in [5], which shows superior results for the price of high computational cost.
3.1 LPA approach
The anisotropic LPA is a pixelwise method for adaptive signal estimation in noisy conditions [16]. For each pixel of the image, local sectorial neighborhood is constructed. Sectors are fitted for different directions. In the simplest case, instead of sectors, 1D directional estimates of four (by 90 degrees) or eight (by 45 degrees) different directions can be used. The length of each sector, denoted as scale, is adjusted to meet the compromise between the exact polynomial model (low bias) and sufficient smoothing (low variance). A statistical criterion, denoted as intersection of confidence intervals (ICI) rule is used to find this compromise [18, 19], i.e., the optimal scale for each direction. These optimal scales in each direction determine an anisotropic starshape neighborhood for every point of the image well adapted to the structure of the image. This neighborhood has been successfully utilized for shapeadaptive transformbased color image denoising and deblurring [14].
Constant depth model
where N is the number of pixels inside adaptive support ${\Omega}_{{x}_{0}}$. Note, that the scheme depends on two parameters: the noise variance of the luminance channel σ^{ Y }and the positive threshold parameter Γ. The latter can be adjusted so to control the smoothing in restored depth map.
Linear regression depth model
where $\widehat{x}$ is homogeneous coordinate.
Aggregation procedure
where M is number of estimates coming from overlapping regions in particular coordinate x_{0}. A result of depth, filtered by LPAICI is given in Figure 4e.
Colordriven LPAICI
Luminance channel of the color image is usually considered as the most informative channel for processing and also as the most distinguishable by the human visual system. That is why many image filtering mechanisms use color transformation to extract luminance and then process it in different way to compare with chrominance channels. This also may be explained by the fact that luminance is usually the less noisy component and thus it is most reliable. Nevertheless, for some color processing tasks pixel differentiation based only on luminance channel is not appropriate due to some colors may have the same luminance whereas they have different visual appearance.
where x_{0} is the currently processing pixel, and $x\in {\Omega}_{{x}_{0}}$.
The color difference map is used as a source of structural information, i.e., the LPAICI procedure is run over this map instead over the luminance channel. Differences are illustrated in Figure 5. In our implementation, we calculate colordifference only for those pixels of the neighborhood which participate in 1D directional convolutions. Additional computational cost of such implementation is about 10% of the overall LPAICI procedure.
For all mentioned LPAICI based strategies the main adjusting parameter, capable to set proper smoothing for varying depth degradation parameter (e.g., varying QP in coding) is the parameter Γ.
3.1.1 Comparison of LPAICI approaches
3.2 Bilateral filter
3.3 Spatialdepth super resolution approach
where L_{ η }denotes tunable search range.
After the bilateral filtering is applied to the cost volume, the depth is refined and true depth discontinuities might be completely recovered.
In our implementation of the filter, we have suggested two simplifications:

we use only one iteration of the filter;

before processing we scale the depth range by factor of 20, thus reducing the number of slices, and subsequently reducing the processing time.
The main tunable parameters of the filter are the parameters of the bilateral filter γ_{ d }and γ_{ c }. As long as the processing time of the filter still remains extremely high, we do not perform optimization of this filter directly, but assume that the optimal parameters γ_{ d }= f_{ d }(QP) and γ_{ c }= f_{ c }(QP) found for the direct bilateral filter are optimal or nearly optimal for this filter as well.
3.4 Practical implementation of the super resolution filtering
Furthermore, the computation cost is reduced by assuming that not all depth hypotheses are applicable for the current pixel. A safe assumption is that only depths within the range d ∈ [d_{min}, d_{max}] where d_{min} = min(z(u)), d_{max} = max(z(u)), u ∈ Ω_{ x }have to be checked.
Additionally, depth range is scaled with the purpose to further reduce the number of hypothesizes. This step is especially efficient for certain types of distortions such as compression (blocky) artifacts. For compressed depth maps, the depth range appears to be sparse due to the quantization effect.
Require: C, the color image; D, the depth image; X, a spatial image domain
for all x ∈ X do
${d}_{\text{min}}=\underset{u\in {\Omega}_{x}}{\text{min}}{D}_{u},{d}_{\text{max}}=\underset{u\in {\Omega}_{x}}{\text{max}}{D}_{u}$
if d_{max}  d_{min} < γ_{thr} then
${\widehat{D}}_{x}={D}_{x}$
else
$F\left(x,u\right)=\frac{\u2225{C}_{u}{C}_{x}\u2225\u2225ux\u2225}{{\sum}_{u}\u2225{C}_{u}{C}_{x}\u2225\u2225ux\u2225}\left\{\mathsf{\text{bilateral}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{weights}}\right\}$
S_{best} ← S_{max} {S_{max} is maximum reachable value for S}
for d = ⌊d_{min}⌋ to [d_{max}] do
S← 0
for all u∈ Ω_{ x }do
E ← min{(d  D_{ u })^{2}, ηL}
S ← S + F ( x, u ) * E
end for
if S < S_{best} then
S_{best} ← S
d_{best} ← d
end if
end for
${\widehat{D}}_{x}={d}_{\mathsf{\text{best}}}$
end if
end for
The memory footprint required by our implementation is significantly lower than the one imposed by a direct implementation. A straightforward implementation would require a large memory buffer to store the complete cost volume in order to process it pixelbypixel and avoid computing (the same) color weights across different slices. In the proposed implementation, two memory buffers with relatively low sizes are required: a memory buffer which is equal to the processing window size to store current color weights, and a buffer to store the cost values for the current pixel along the 'd' dimension. In case of multithread (parallelized) implementation, these memory buffers are multiplied by the number of processing threads. More information about platformspecific optimization of the proposed algorithm is given in [20].
4 Experimental results
4.1 Experimental setting
In our experiments, we consider depth maps degraded by compression. Thus degradation is characterized by the quantization parameter (QP). For better comparison of selected approaches, we present two types of experiments. In the first set of experiments, we compare the performance of all depth filtering algorithms assuming the true color channel is given (it has been also used in the optimization of the tunable parameters). This shows ideal filtering performance, while in practice it cannot be achieved due to the fact that the color data is also degraded by e.g., compression.
In the second set of experiments, we compare the effect of depth filtering in the case of mild quantization of the color channel. General assumption is that color data is transmitted with backward compatibility in mind, and hence most of the bandwidth is occupied by the color channel. Depth maps in this scenario are heavily compressed, to consume not more than 1020% of the total bit budget [21, 22].
We consider the case where both y and z are to be coded as H.264 intra frames with some QPs, which leads to their quantized versions y_{ q }and z_{ q }. The effect of quantization of DCT coefficients has been studied thoroughly in the literature and corresponding models have been suggested [23]. Following the degradation model in Section 2.2, we assume quantization noise terms added to the color channels and the depth channel considered as independent white Gaussian processes: ${\epsilon}^{C}\left(\cdot \right)~N\left(0,{\sigma}_{C}^{2}\right),\epsilon \left(\cdot \right)~N\left(0,{\sigma}^{2}\right)$. While this modeling is simple, it has proven quite effective for mitigating the blocking artifacts arising from quantization of transform coefficients [14]. In particular, it allows for establishing a direct link between the QP and the quantization noise variance to be used for tuning deblocking filtering algorithms [14].
Training and test datasets for our experiments (see Figure 12) were taken from Middlebury Evaluation Testbench [12, 24, 25]. In our case, we cannot tolerate holes and unknown areas in the depth datasets, since they produce fake discontinuities and unnatural artifacts after compression. We semimanually processed 6 images to fill holes and to make their width and height be multiples of 16.
4.1.1 Parameters optimization
Each tested algorithm has a few tunable parameters which could be modified according particular filtering strategy related with a quality metric. So, to make comparison as fair as possible, we need to tune each algorithm to its best, according such a strategy and within certain range of training data.
Our test approach is to find empirically optimal parameters for each algorithm over a set of training images. It is done separately for each quality metric. Then, for each particular metric we evaluate it once more on the set of test images and then average. Then comparison between algorithms is done for each metric independently.
Particularly, for bilateral filtering and hypothesis (superresolution) filtering we are optimizing the following parameters: processing window size, γ_{ s }and γ_{ c }. For the Gaussian Blurring we are optimizing parameters σ and processing window size. For LPAICI based approach we are optimizing the Γ parameter.
4.2 Visual comparison results
As it is seen in the top row (a), rendering with true depth, results in sharp and straight object contours, as well as in continuous shapes of occlusion holes. For such holes, a suitable occlusion filling approach will produce good estimate.
Row (b) shows unprocessed depth after strong compression (H.264 with QP = 51) frame and its rendering capability. Objects edges are particularly affected by block distortions.
With respect to occlusion filling, the methods behave as follows.

Gaussian smoothing of depth images is able to reduce number of occluded pixels, making occlusion filling simpler. Nevertheless, this type of filtering does not recover geometrical properties of depth, which results in incorrect contours of the rendered images.

Internal H.264 inloop deblocking filtering was performed similarly to the Gaussian smoothing, with no improvement of geometrical properties.

LPAICI based filtering technique performs significantly better both is sense of depth frame and rendered frame visual quality. Geometrical distortions are less pronounced, however, still visible in rendered channel.

Bilateral filter almost recovers the sharp edges in depth image, while has minor artifacts (for instance, see chimney of house).

Superresolution depth filter recovers discontinuities as good as bilateral or even better. Resulted depth image does not have artifacts as in the previous methods. Geometrical distortions in rendered image are not pronounced.
Among all filtering results, the latter one contains occlusions which are most similar to the occlusions of the original depth rendering result. Visually, superresolution depth approach is considered to be the best. The numerically estimated results for all presented approaches are presented in following section.
4.3 Numerical results for ideal color channel
Xaxis on all the plots represents varying QP parameters of the H.264 Intra coding, while each Y axis shows a particular metric. On the most of the metric plots it is visible that there is no need to apply any kind of filtering before QP reaches some critical value. Before that value, the quality of the compressed depth is high enough, so no filtering could improve it.
The group of structurallyconstrained methods clearly outperforms the simple methods working on the depth image only. The two PSNRbased metrics and the BAD metric seem to be less reliable in characterizing the performance of the methods. The three remained measures, namely depth consistency, discontinuity falses and gradientnormalized RMSE perform in a consistent manner. While Normalized RMSE is perhaps the measure closest to the subjective perception, we favor also the other two measures of this group as they are relatively simple and do not require calculation of the warped (rendered) image.
4.4 Numerical results for compressed color channel
So far, we have been working with uncompressed color channel. It has been involved in the optimizations and comparisons. Our aim was to characterize the pure influence of the depth restoration only.
In practice, when 'colorplusdepth' frame is compressed and then transmitted over a channel, the color frame is also compressed with a prespecified bitrate, aiming at maximizing visual quality of the video. Transmission of 'color plus depth' stream has also to be constrained within a given bitbudget. Thus, receiverside device has to cope with compressed color and compressed depth.
5 Conclusions
In this article, the problem of filtering of depth maps was addressed and the case of processing of depth map images impaired by compression artifacts was emphasized.
Before proceeding with the actual depth processing task, the characteristics of the representation viewplusdepth were overviewed, including methods of depth image based rendering for virtual view generation, and formulation of the depth map filtering problem. In addition, number of quality measures for evaluating the depth quality were studied and new ones were suggested.
For the case of postfiltering of depth maps impaired by compression artifacts, a number of filtering approaches were studied, modified, optimized, and compared. Two groups of approaches were underlined. In the first group, techniques working directly on the depth map and not taking into account the accompanying color frame were studied. In the second group, filtering techniques utilizing structural or color information from the accompanying frame were considered. This included the popular bilateral filter as well as its extension based on probabilistic assumptions and originally suggested for superresolution of depth maps. Furthermore, the LPAICI approach was specifically modified for the task of depth filtering and a few versions of this approach were proposed. The techniques from the second group have shown better performance over all measures used. More specifically, the method based on probabilistic assumptions showed superior results for the price of very high computational cost. To tackle this problem, we have suggested practical modifications leading to faster and higher memoryefficient version which adapts to the true depth range and its structure and is suitable for implementation on a mobile platform. The competitive methods, i.e., LPAICI and bilateral filtering, should not be, however, discarded as fast implementations of those do exist as well. They demonstrated competitive performance and thus form a scalable set of algorithms. Practitioners can choose between the algorithms in the second group of methods depending on the requirements of their applications and available computational resources. The deblocking tests demonstrated that it is possible to tune the filtering parameters depending on the QP of the compression engine. It is also feasible to allocate really small fraction of the total bit budget for compressing the depth, thus allowing for highquality backward compatibility and channel fidelity. The price for this would be some additional postprocessing at the receiver side.
Declarations
Authors’ Affiliations
References
 Mueller K, Merkle P, Wiegand T: 3D video representation using depth maps. Proc IEEE 2011, 99(4):643656.View ArticleGoogle Scholar
 Fehn C: Depthimagebased rendering (DIBR), compression and transmission for a new approach on 3DTV. In Proceedings of SPIE Stereoscopic Displays and Virtual Reality Systems XI. Volume 5291. San Jose, CA, USA; 2004:93104. SPIEView ArticleGoogle Scholar
 Vetro A, Yea S, Smolic A: Towards a 3D video format for autostereoscopic displays. In SPIE Conference on Applications of Digital Image Processing XXXI. Volume 7073. San Diego, CA, USA; 2008:70730F. SPIEView ArticleGoogle Scholar
 Kang S, Szeliski R, Chai J: Handling occlusions in dense multiview stereo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2001). Volume 1. IEEE Computer Society, Kauai, HI, USA; 2001:103110.Google Scholar
 Qingxiong Y, Ruigang Y, Davis J, Nister D: Spatialdepth super resolution for range images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007). IEEE Computer Society, Minneapolis, MN; 2007:18.Google Scholar
 Kopf J, Cohen M, Lischiski D, Uyttendaele M: Joint bilateral upsampling. In ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007). Volume 26. ACM New York, NY, USA; 2007:96.196.5.Google Scholar
 Riemens AK, Gangwal OP, Barenbrug B, Berretty RPM: Multistep joint bilateral depth upsampling. Proc SPIE Visual Commun Image Process 2009, 7257: 72570M.Google Scholar
 Smirnov S, Gotchev A, Egiazarian K: Methods for restoration of compressed depth maps: a comparative study. In Proceedings of the Fourth International Workshop on Video Processing and Quality Metrics Consumer Electronics, VPQM 2009. Scottsdale, Arizona, USA, 2009; 2009:6.Google Scholar
 Alatan A, Yemez Y, Gudukbay U, Zabulis X, Muller K, Erdem CE, Weigel C, Smolic A: Scene representation technologies for 3DTVA survey. IEEE Trans Circuits Syst Video Technol 2007, 17(11):15871605.View ArticleGoogle Scholar
 Merkle P, Morvan Y, Smolic A, Farin D, Muller K, de With PHN, Wiegand T: The effect of depth compression on multiview rendering quality. In 3DTVConference: The True Vision  Capture, Transmission and Display of 3D Video. Istanbul; 2008:245248.View ArticleGoogle Scholar
 Boev A, Hollosi D, Gotchev A, Egiazarian K: Classification and simulation of stereoscopic artifacts in mobile 3DTV content. In Stereoscopic Displays and Applications XX. Volume 7237. San Jose, CA, USA; 2009:72371F. SPIEView ArticleGoogle Scholar
 Scharstein D, Szeliski R: A taxonomy and evaluation of dense twoframe stereo correspondence algorithms. Int J Comput Vision 2002, 47: 742. 10.1023/A:1014573219977View ArticleGoogle Scholar
 Baker S, Scharstein D, Lewis JP: A database and evaluation methodology for optical flow. In Proc IEEE Int'l Conf on Computer Vision. Crete, Greece; 2007:243246.Google Scholar
 Foi A, Katkovnik V, Egiazarian K: Pointwise shapeadaptive dct for highquality denoising and deblocking of grayscale and color images. IEEE Trans Image Process 2007, 16(5):13951411.MathSciNetView ArticleGoogle Scholar
 List P, Joch A, Lainema J, Bjntegaard G, Karczewicz M: Adaptive deblocking filter. IEEE Trans Circuits Syst Video Technol 2003, 13(7):614619.View ArticleGoogle Scholar
 Katkovnik V, Egiazarian K, Astola J: Local Approximation Techniques in Signal and Image Processing. Volume PM157. SPIE Press, Monograph; 2006.View ArticleGoogle Scholar
 Tomasi C, Manduchi R: Bilateral Filtering for Gray and Color Images. In IEEE International Conference on Computer Vision. Bombay; 1998:839846.Google Scholar
 Goldenshluger A, Nemirovski A: On spatial adaptive estimation of nonparametric regression. Math Meth Statistics 1997, 6: 135170.MathSciNetGoogle Scholar
 Katkovnik V: A new method for varying adaptive bandwidht selection. IEEE Trans Signal Process 1999, 47(9):25672571. 10.1109/78.782208View ArticleGoogle Scholar
 Suominen O, Sen S, Smirnov S, Gotchev A: Implementation of depth map filtering algorithms on mobilespecific platforms, accepted. In The International Conference on Consumer Electronics (ICCE). Las Vegas, USA; 2012:319322. IEEE January 1316, 2012Google Scholar
 Morvan Y, Farin D, de With PHN: Depthimagecompression based on an RD optimized quadtree decomposition for the transmission of multiview images. In IEEE International Conference on Image Processing. Volume 5. San Antonio, TX, USA; 2007:105108.Google Scholar
 Tikanmaki A, Smolic A, Mueller K, Gotchev A: Quality assessment of 3D Video in rate allocation experiments. In IEEE International Symposium on Consumer Electronics ISCE 2008. Algarve, Portugal; 2008:14.View ArticleGoogle Scholar
 Robertson M, Stevenson R: DCT quantization noise in compressed images. IEEE Trans Circuits Syst Video Technol 2005, 15(1):2538.View ArticleGoogle Scholar
 Szeliski D, Scharstein R: Highaccuracy stereo depth maps using structured light. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003). Volume 1. Madison, WI; 2003:195202.Google Scholar
 Scharstein D, Szeliski R: Middlebury stereo vision page. Available at http://vision.middlebury.edu/stereo
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.