Open Access

Robust depth enhancement and optimization based on advanced multilateral filters

EURASIP Journal on Advances in Signal Processing20172017:51

DOI: 10.1186/s13634-017-0487-7

Received: 12 December 2016

Accepted: 26 June 2017

Published: 10 July 2017


Stereo matching of two distanced cameras and structured-light RGB-D cameras are the two common ways to capture the depth map, which conveys the per-pixel depth information of the image. However, the results with mismatched and occluded pixels would not provide accurately well-matched depth and image information. The mismatched depth-image relations would degrade the performances of view syntheses seriously in modern-day three-dimension video applications. Therefore, how to effectively utilize the image and depth to enhance themselves becomes more and more important. In this paper, we propose an advanced multilateral filter (AMF), which refers spatial, range, depth, and credibility information to achieve their enhancements. The AMF enhancements could sharpen the image, suppress noisy depth, filling depth holes, and sharpen the depth edges simultaneously. Experimental results demonstrate that the proposed method provides a superior performance, especially around the object boundary.


Depth enhancement Multilateral filter Hole filling Mold matching Depth refinement

1 Introduction

In general, the three-dimensional (3D) video is widely recognized as a visual media technique which enables viewers to perceive the depth in a scene without special glasses. Owing to understanding the 3D video among users who wish to experiment extended visual sensations, developments in 3D video technologies have initiated the commercialization of 3D services in consumer products such as 3D TVs [1], tablet PCs, mobile devices, and computer gaming devices. At the same time, the multi-view video plus depth (MVD) format has appeared as a potent technique for 3D video applications [2, 3]. To produce virtual views at desired viewpoints with low processing costs, the MVD format uses the depth-image-based rendering (DIBR) techniques [46]. The DIBR technique synthesizes images at the desired viewpoint by using the color image and its corresponding depth map. Thus, it can be treated as an efficient data format for the 3D video. Moreover, the depth map is an image which represents the range information of the captured scene. The depth map is important because it affects the quality of the synthesized images.

The acquisitions of depth information can be categorized into two approaches: the indirect estimation approach based on stereo matching of two images taken in different locations and the direct measure approach based on the time-of-flight of depth sensors. The stereo matching with visual computation estimates the depth map from two-view images [79]. However, its computational complexity is high and its estimation accuracy would be fail in texture-less and occluded regions. Recently, the low-cost structured-light RGB-D cameras have been used to capture high-resolution color images and low-resolution depth maps [10]. Thus, depth map upsampling [11, 12] followed by its enhancement [13] becomes an inevitable task because the quality of the DIBR process heavily depends on the accuracy of depth information. To improve the depth map of RGB-D cameras, the following problems should be solved. First, the boundary of an object in the depth map would not be well matched with that of its corresponding color image. The region near the object boundary is commonly referred to the mismatched region. Secondly, the holes with no depth information often happened in the depth map because the infrared (IR) light can be absorbed or obstructed by the object. Thirdly, the depth map suffers from the optical noise because of multiple reflections or scatters of the IR light.

In general, the images usually have better quality but could not be well matched with the depth map. Thus, it is reasonable to assume that the depth maps usually have much worse quality with noisy, mismatched, and hole pixels. To overcome these problems, the joint bilateral filters (JBF) proposed [1416] use color and spatial similarity between corresponding pixels in the image to enhance the depth map. Then, the iterative joint multilateral filtering (IJMF) suggested in [17] achieves the best unsharp masking structure through the training of parameters. This iterative method not only enhances the sharpness of the image but also smooths the corresponding video pixel values; however, it requires a complex training process to obtain the parameters. In order to overcome the drawbacks of the IJMF method, the adaptive joint trilateral filter (AJTF) method has been proposed in [18] by using different designed patterns to test the differences between the image and its corresponding depth map. The depth map is then sharpened along object boundary borders and suitable for practical DIBR process [1921]. However, this method is easy to be affected by complex image texture, which suffers serious blocking effect in depth map for high texture objects.

In summary, the above methods cannot accurately enhance the noisy depth maps with unmatched color image and depth map to result in distortion of the synthesized 3D image. Therefore, in this paper, we propose an adaptive multilateral filter (AMF) for effective depth enhancement. The AMF approach considers the similarities of the spatial, range, depth, and credibility information can successfully suppress the noise, filling the holes, and sharpening the object edges simultaneously. The rest of this paper is present as follows. In Section 2 the background and motivations are explained. In Section 3, the proposed advanced multilateral filter is addressed in details. The comparisons of subjective SSIM and PSNR performances and the viewing quality are exhibited in Section 4. Finally, conclusions of this paper are exhibited in Section 5.

2 Background and motivations

Generally, the source depth map could be generated by fast stereo matching technique with subsample stereo images or captured by RGB-D cameras with a lower resolution than the color image. Thus, the source depth map produced by fast stereo matching and depth camera usually has a lower resolution than the corresponding color image and contains a lot of noisy pixels including unknown pixels due to occlusions. Thus, the source depth map will be first up sampled before depth enhancement. In the paper, the traditional bicubic interpolation [22] is applied to recover the resolution of the source depth map to that of the corresponding color image. After upsampling depth map, we assume that the original texture and corresponding depth map, which are respectively expressed by g(x, y) and d(x, y), are with the same spatial resolution and come with some undesired noisy pixels in depth maps and images. Specially, the foreground boundaries do not well match the corresponding color image and jagged boundaries are produced from the interpolated depth map after the bicubic interpolation. If the depth is estimated by stereo matching algorithms, there exist mismatched and occluded pixels due to singularity properties. On the other hand, we cannot generate virtual image precisely by depth image-based rendering (DIBR), if the image and its corresponding depth map cannot be matched successfully due to the noises and holes existed in depth maps. Hence, the enhancements of the image and its depth become very important in 3D visualization.

Thus, the upsampled depth map d(x, y) would be affected by three major factors with noisy, blurring, and missing pixels [23], where the noise pixels are caused by the distortion of capture devices resulting in unmatched depth, the blurring pixels are produced by interpolation filters mostly along object boundaries, and the missing pixels are mainly originated from the presence of object occlusions and concave objects. Thus, the quality improvement of the upsampled depth map d(x, y) becomes an important task in 3D visualization applications. In [23], the traditionally enhanced processing generally contains two stages including the suppressing noise and the image-depth enhancement. However, these stages take a high computational complexity and large computational time.

Therefore, a robust filter for solving both the existed holes and flatness problems is needed to improve the performance and reduce the computational complexity as the same time. In this paper, we propose a new algorithm which is called advanced multilateral filter (AMF) to jointly fill the holes and enhance the sharpness of the upsampled depth map d(x, y) and sharpen the image g(x, y) at the same time. Besides, the parameters of the AMF can be determined according to the accuracy of the depth map and image. The proposed AMF does not require the complicated parameter training, and it is applicable to the practical DIBR applications, which require the robustness against any deformation of images or depth maps. In AMF process, the image and the corresponding depth map are classified based on the designed binary molds first. Excluding the hole regions, the image and the corresponding depth map are smoothed to reduce the noisy and blurring pixels first. Then, the smooth enhancement can degrade the high-frequency noise. Then, the holes are crammed by surrounding neighbors. Finally, after AMF, the rolling guidance refinement (RCR) method is used to sharpen the object edges.

3 Proposed AMF algorithm

3.1 Advanced multilateral filter

As shown in Fig. 1, the proposed depth enhancement system is composed of two major steps, the advanced multilateral filter (AMF) and the rotating counsel filter (RCR). We assume that g(x, y) and d(x, y) with the same spatial resolution represent the original image and the corresponding depth map, respectively. The depth map, which could be captured and upsampled from a stereo camera or estimated by a stereo matching method, often comes with noise and holes. The proposed AMF enhanced results g’(x, y) and d’ (x, y) are respectively given as
$$ {g}^{\prime}\left( x, y\right)={\displaystyle \sum_{i, j\in \varOmega} h\left( x, y; i, j\right) g\left( i, j\right)} $$
$$ {d}^{\prime}\left( x, y\right)={\displaystyle \sum_{i, j\in \varOmega} h\left( x, y; i, j\right) d\left( i, j\right)} $$
Fig. 1

System block diagram for the advanced multilateral filter (AMF) and the rotating counsel filter (RCR)

where the response h(x, y; i, j) at the position (x, y) with respect to the impulse at (i, j), is defined by,
$$ h\left( x, y; i, j\right)=\left\{\begin{array}{l}\frac{J_s{J}_d{J}_g{J}_c}{q_{x, y}}.\kern1em \mathrm{if}\ \left( i, j\right)\in \varOmega \\ {}0,\kern5em \mathrm{otherwise}.\end{array}\right. $$
In (3), h(x, y; i, j) is the adaptive multilateral filter, which is used to enhance the noisy depth map, Ω is a selected filtering window, (x, y) is the coordinate of the center position of the window, and (i, j) are the neighbor positions of (x, y). J s , J d , J g , and J c are referred to spatial, depth, range, and credibility filtering coefficients, which are respectively defined as
$$ {J}_s= \exp \left(-\frac{{\left( x- i\right)}^2+{\left( y- j\right)}^2}{2{\sigma_s}^2}\right) $$
$$ {J}_d= \exp \left(-\frac{{\left( d\left( i, j\right)- d\left( x, y\right)\right)}^2}{2{\sigma_d}^2}\right) $$
$$ {J}_g= \exp \left(-\frac{{\left( g\left( i, j\right)- g\left( x, y\right)\right)}^2}{2{\sigma_g}^2}\right) $$
$$ {J}_c=1- \exp \left(-\frac{c\left( i, j\right)\times d\left( i, j\right)}{2{\sigma_c}^2}\right) $$
where J s is the weight of the depth distance between center position and its corresponding neighbor position, J d is the weight of the depth difference between center position and its corresponding neighbor position, J g is the weight of the texture difference between center position and its corresponding neighbor position, J c is the weight of the enhancement of the depth map, which is near the texture image edge. In (3), the normalization factor is given as
$$ {q}_{x, y}={\displaystyle {\sum}_{i, j\in \varOmega}{J}_s{J}_d{J}_g{J}_c}. $$
In (7), the credibility map, c(x, y) is computed from texture image as
$$ c\left( x, y\right)=\left\{\begin{array}{l}1,\begin{array}{cc}\hfill \hfill & \hfill G\left( x, y\right)\ge \phi \hfill \end{array}\\ {}0,\begin{array}{cc}\hfill \hfill & \hfill G\left( x, y\right)<\phi \hfill \end{array}\end{array}\right. $$
where ϕ is a selected threshold and G(x, y) is the magnitude of gradient of texture image as
$$ G\left( x, y\right)=\sqrt{{G_x}^2\left( x, y\right)+{G_y}^2\left( x, y\right)}. $$

In (10), the horizontal and vertical direction gradients, G x (x, y) and G y (x, y) are computed from Sobel operators [24]. According to (9), the credibility map can be determined if the pixel is in smooth or edge region. If the corresponding candidate of d(i, j) is in edge regions, c(i, j) = 1. The corresponding candidate depth, d(i, j) will be strengthened with the weight controlled by (7). The AMF will be given a strong weight by J c to enhance d(x, y). On the other hand, if the corresponding candidate of d(i, j) is in smoothing regions, c(i, j) = 0 such that the corresponding candidate depth, d(i, j) is weakened with the weight controlled by (7).

To reduce computation in exponential functions, Taylor expansion formula is used to approximate the exponential function as
$$ {e}^w\approx p(w)=1+ w+\frac{w^2}{2}. $$
With (3) and (11), the approximated AMF impulse response then becomes
$$ h\hbox{'}\left( x, y; i, j\right)=\left\{\begin{array}{l}\frac{p\left({w}_s\right) p\left({w}_d\right) p\left({w}_g\right) p\left(1-{w}_c\right)}{q_{x, y}^{\prime }}, \mathrm{if}\left( i, j\right)\in \varOmega \\ {}0\kern9.5em ,\mathrm{otherwise}\end{array}\right. $$
where the spatial, depth, range, and credibility filtering coefficients respectively become
$$ {w}_s=-\frac{{\left( x- i\right)}^2-{\left( y- j\right)}^2}{2{\sigma_s}^2} $$
$$ {w}_d=-\frac{{\left( d\left( i, j\right)- d\left( x, y\right)\right)}^2}{2{\sigma_d}^2} $$
$$ {w}_g=-\frac{{\left( g\left( i, j\right)- g\left( x, y\right)\right)}^2}{2{\sigma_g}^2} $$
$$ {w}_c=-\frac{c\left( i, j\right)\times d\left( i, j\right)}{2{\sigma_c}^2} $$
$$ {q}_{x, y}^{\prime }={\displaystyle {\sum}_{i, j\in \varOmega} p\left({w}_s\right) p\left({w}_d\right) p\left({w}_g\right) p\left({w}_c\right)}. $$

It is noted that we need to determine four standard deviations, σ s , σ d , σ g , and σ c to achieve the best enhancement of depth map, where the mold matching technique is used for the selection of AMF parameters.

3.2 Mold matching for image and depth map

The mold is used to match image and corresponding depth map. In this paper, as shown in Fig. 2, there are 56 binary molds, M m , for m = 1, 2, …, 56 for mold classification of image and depth blocks. The designed 11 × 11 molds could cover all possible edges and corners of the blocks of image and depth map.
Fig. 2

Binary molds for image and depth block classification

Let M m represent the mth mold and I g be the best mold index when \( {a}_n^{\prime } \) be the smallest a n for the 11 × 11 block of g. The computation for finding the best matching mold can be expressed as
$$ {a}_n^{\prime }= \min \left\{{a}_n\right\},\kern1em n=1,2,\dots, {N}_R $$
$$ {a}_n={\displaystyle {\sum}_{m\in {R}_n^0}{\left( g\left({x}_m\right)-{\eta}_n^0\right)}^2}+{\displaystyle {\sum}_{m\in {R}_n^1}{\left( g\left({x}_m\right)-{\eta}_m^1\right)}^2} $$
$$ {\eta}_n^k=\frac{{\displaystyle {\sum}_{m\in {R}_n^k} g\left({x}_m\right)}}{\left|{R}_n^k\right|},\kern0.75em k=0,1. $$

In (18), a n denotes the matching error between the nth mold and the image block. Thus, the minimum of a n represents the best matching mold to the image block among all the candidate molds. N R , which is 56, is the number of total molds, \( {R}_n^0 \) and \( {R}_n^1 \) respectively represent the black and white regions in the nth mold as shown in Fig. 2. With k = 0 or 1, \( {\eta}_j^k \) is the average of texture values in \( {R}_n^k \), and \( \left|{R}_n^k\right| \) denotes the number of elements in \( {R}_n^k \). In (19), we use the least squares error method to predict the best mold, \( {M}_{I_g} \) for the image block. To find the best mold for the depth block, \( {M}_{I_d} \), we can simply replace g(x m ) with d(x m ) in (19) and (20). In addition, if the block variance is less than a given threshold, e.g., 1, we would assume that the corresponding block belongs to the smooth region. In this case, a new binary mold can be assigned by consisting of all elements with 1’s or 0’s, denoted by M 0.

By comparing similarity of the best molds of depth map and image blocks, the sum of absolute differences (SAD) is used to calculate the discrepancy, and the local similarity is measured by the mold matching distortion, D pm , as:
$$ {D}_{pm}=\frac{ \min \left( SAD\left({M}_{I_d},{M}_{I_g}\right), SAD\left(\overline{M_{I_d}},{M}_{I_g}\right)\right)}{D_{\max }} $$
$$ SAD\left( a, b\right)={\displaystyle {\sum}_{m\in B}\left| a(m)- b(m)\right|}. $$

The SADs between two binary molds represents the total number of mismatched pixels. The SADs of the mold and its binary inversion of the depth map comparing to the image mold, which are denoted {\( {M}_{I_d},{M}_{I_g} \)} and {\( {\overline{M}}_{I_d},{M}_{I_g} \)}, are respectively computed. Then, the minimum SAD is used to represent the mold similarity. It is worth noting that \( SAD\left(\overline{M_{I_d}},{M}_{I_g}\right) \) is necessary because the binary mold only classifies the block pixels into two different groups. If we reverse bits in all molds, the mold matching processes defined in (18), (19), and (20) will achieve the same results. It means texture image and depth map corresponding to each binary mold with all the black region and white region swap, the similarity between the two molds will not be changed. Since the smallest SAD will be generated by D max, D max denotes the largest SAD between any two molds, the mold similarity after normalization of D max, the value of D pm is between 0 and 1. In Fig. 2, by comparing them one-by-one, we found that the maximum SAD is D max = 83. The value of D max is defined as the largest difference between the image mold and the depth mold.

According to matching of depth map and image, we can use σ g , σ d and σ c to adjust the influence of the range, depth and credibility filters. Thus, three standard deviations according to D pm are given as:
$$ {\sigma}_g= \max \left({\sigma}_{g, L}, \min \left({\sigma}_{g, U},{k}_1\cdot {D}_{pm}\right)\right) $$
$$ {\sigma}_d= \max \left({\sigma}_{d, L}, \min \left({\sigma}_{d, U},{k}_2\cdot {D}_{pm}\right)\right) $$
$$ {\sigma}_c= \max \left({\sigma}_{c, L}, \min \left({\sigma}_{c, U},{k}_3\cdot {D}_{pm}\right)\right) $$
where σ g,L (σ d,L , σ c,L ) and σ g,U (σ d,U , σ c,U ) denote the lower and upper limits, respectively. Thus, for the AMF, we can linearly increase or decrease k 1, k 2 and k 3 to adjust the strong or weak influence of D pm .

3.3 Rotating counsel refinement for depth map

After the AMF enhancement, the tiny jagged edges will produce some errors in the synthesis view of the DIBR technology, for example, the boundary of the object is extended to the wrong region. Therefore, the RCR method [25] is used to adjust the object edge of the enhanced depth map. Thus, there exist several algorithms can effectively detect edges and eliminate jagged edges [26], such as guided filter [27, 28], geodesic filters [29, 30], weighted median filters [31, 32], and bilateral filter [3335]. In this paper, we suggest the rotating counsel refinement (RCR), the filtering, is used to remove the tiny jagged edge of enhanced depth maps. The RCR process is implemented in an iterative manner [36], where the iterative RCR is composed of two major steps, including small structure smoothing and edge recovery as illustrated in Fig. 3. The RCR method uses the Gaussian filter to smooth the enhanced depth map, the enhanced depth map is called the guided depth map after the Gaussian filter, then the guided depth map is used to iterate the original enhanced depth map and sharpen the tiny jagged edges.
Fig. 3

Flow chart of rotating counsel refinement

4 Experimental results

To evaluate the effectiveness of advanced multilateral filter (AMF) and the rolling guidance refinement (RGR), the proposed depth enhancement system is experimented on Middlebury database [37, 38] and RGBD database. Virtual depth maps are generated by the stereo matching method on the Middlebury database, in addition, natural depth maps are produced by the stereo camera on the RGBD database. Figure 4 shows six test images, Art, Books, Doily, Moebius, RGBD_1, and RGBD_2 used for evaluating the performance of depth enhancement and depth refinement.
Fig. 4

Four test images (left) and their corresponding depth maps (right): a Arts (432 × 381), b Books (463 × 370), c Doily (417 × 370), d Moebius (463 × 370), e RGBD_1 (640 × 480), and f RGBD_2 (640 × 480)

4.1 Performance evaluation of depth enhancement

In experiments, the weighted factors are empirically set as k 1= 15, k 2 = 15, k 3 = 18 in (23)–(25) and ϕ is set to 240 in (9). Decreases or increases of the above factors will result in strong or weak enhancement of the results. For comparisons of subjective performances, the proposed method without using any hole filling is compared to the joint bilateral filter (JBF) [16], intensity guided depth superresolution (IGDS) [39], compressive sensing based depth upsampling (CSDU) [40], and adaptive joint trilateral filter (AJTF) [18] methods all coupled with cross-based hole filling (CHF). After depth enhancement, the enhanced depth maps by the proposed AMF process as well as JBF and AJRF methods with CHF are shown in Fig. 7. The simulation results show that the proposed AMF method effectively removes the noise and hole pixels, but the enhanced depth map still exists tiny jagged edges. Therefore, the depth refinement along object edges is another important step.

In Fig. 5a, if the value of the threshold ϕ is too small, many unnecessary details of the texture will be produced, if the value of the threshold ϕ is too large, the strong edge regions of the color image will be excluded; therefore, the performance of the AMF will be affected by the threshold ϕ. Different levels of edge detection simulation results are shown in Fig. 6.
Fig. 5

Average PSNR performances for a the threshold φ , b the standard deviation σ d , c the standard deviation σ g , d the standard deviation σ c

Fig. 6

Different levels of edge detection: a φ  = 180, b φ  = 220, c φ =240, and d φ  = 340

Table 1 shows the PSNR performances between the ground truth and the depth enhancement results obtained by different methods. In Table 1, the maximum PSNR values achieved by the proposed method are 39.0004, 41.3019, 41.7411, 42.5622, 31.6245, and 33.7466 dB, respectively, while the minimum PSNR values achieved by AJTF [18] with CHF are 32.4128, 29.2324, 29.5552, 29.3657, 27.6347, and 31.4258 dB, respectively. We learn that the proposed method achieves the improvements with PSNR 6.5876, 12.0695, 12.1859, 13.1965, 3.9898, and 2.3208 dB. The PSNR subjective performances show that the proposed method performs better than the JBF [16] with CHF, IGDS [39] with CHF, CSDU [40] with CHF, and AJTF [18] with CHF. It is noted that the proposed method does not need the hole filling before the enhancement procedures. The best result for each sample is highlighted through bold face type. Table 2 shows the PSNR comparisons with sensitivity of parameters on Middlebury and RGBD datasets. The strong depth enhancement has sharpening edges of objects, but there will be accompanied by tiny jagged edges; on the other hand, the weak depth enhancement has smoothing edges of objects, but the PSNR value is worse than the strong depth enhancement relatively (Fig. 7).
Table 1

PSNR comparisons with different approaches on Middlebury dataset and RGBD dataset

Methods\image data







JBF [16] with CHF







IGDS [39] with CHF







CSDU [40] with CHF







AJTF [18] with CHF







Proposed method







Table 2

PSNR comparisons with sensitive of parameters on Middlebury and RGBD dataset


Image data

σ s = 2, k 1= 15, k 2 = 15, k 3 = 18

σ s = 2, k 1= 10, k 2 = 10, k 3 = 12

Virtual depth maps







Natural depth maps







Fig. 7

Results of the depth enhancement coupled with hole filling results obtained by a noisy depth map, b joint bilateral filter (JBF) [16], c intensity guided depth superresolution (IGDS) [39], d compressive sensing based depth upsampling (CSDU) [40], e adaptive joint trilateral filter (AJTF) [18], and f the proposed AMF for Art, Books, Doily, Moebius, RGBD_1, and RGBD_2

4.2 Depth enhancement with RCR process

So as to assess the performance of the AMF for depth enhancement coupled with rotating counsel refinement, some parameter values based on experience need to be determined. In simulations, we also found that the rotating counsel iterations converge speedily. Unlike traditional refinement methods, the procedure of the RCR converges to a significant depth map faithful to the input no matter how many iterations are performed. Figure 8 shows results for testing depth maps. Figure 9 shows the details of the results which are the magnified portions of Fig. 8a, c, when the AMF and the RGR (AMF_RGR) are both applied to refine the depth maps. Table 3 shows the PSNR results when AMF and RGR methods are both used. The depth maps after RCR process, in the objective of the simulation results, the PSNR values of refined depth maps are increased, in the subjective simulation results, the tiny jagged edge problems are also resolved. Table 4 shows that under the global error measurement, the proposed method is better than the JBF and AJTF methods, and more so when increasing the resolution of depth maps.
Fig. 8

Four depth map results obtained by the AMF (left side) and by the AMF and the RGR (right side): a Art, b Books, c Doily, d Moebius, e RGBD_1, and f RGBD_2

Fig. 9

Two selected regions obtained by the AMF (left side) and by the AMF coupled with the RGR (right side): a Art and b Doily

Table 3

PSNR performances achieved by AMF and RCR methods on Middlebury dataset and RGBD dataset

Image data














Table 4

RMSE performances achieved by different depth enhancement methods on Middlebury dataset and RGBD dataset


JBF [16] with CHF

IGDS [39] with CHF

CSDU [40] with CHF

AJTF [18] with CHF


AMF with RCR



















Table 5 exhibits the execution time of AMF and RCR stages suggested in the proposed depth enhancement system. Table 6 shows the total execution time required by different methods. The proposed method is much more effective than the JBF and AJTF, but the calculation time is only 3.59% longer than the AJTF. So, it is worthwhile from the cost-effective ratio of viewpoint, where the experiments are carried on an Intel Core i7-4770 CPU computer with a 12-GB RAM and tested on the Matlab platform (Version R2013a).
Table 5

Execution time of three main stages in the proposed method




Execution time(s)



Table 6

Execution time of related methods and proposed method


JBF [16] with CHF

IGDS [39] with CHF

CSDU [40] with CHF

AJTF [18] with CHF

Proposed method

Execution time (s)






The histograms of horizontal depth value and vertical depth value are shown in Fig. 10. In order to obtain an objective evaluation of the enhanced depth map quality, the enhanced depth map quality metric suggested in [41] is used for comparisons. In Fig. 10a, the enhanced depth map of the proposed method has smaller depth values in the left half of histogram, representing less visual fatigue; in Fig. 10b, the enhanced depth map of the proposed method focusses on the central region of the histogram which means less visual fatigue.
Fig. 10

Results of the enhanced depth map quality metric: a horizontal disparity and b vertical disparity

4.3 Performance evaluation with Middlebury datasets

In order to understand the quality of the enhanced depth map, the depth image-based rendering proposed in [21] is used to produce the synthesized views from the depth maps obtained by different depth enhancement methods. For objective evaluations, the SSIM and PSNR performances are shown in Table 7. The best result for each sample is highlighted through bold face type. The proposed approach gives better the SSIM and PSNR results than JBF [16] and AJTF [18].
Table 7

SSIM and PSNR performance achieved by different depth enhancement methods on Middlebury dataset and RGBD dataset




















JBF [16] + CHF













IGDS [39] + CHF













CSDU [40] + CHF













AJTF [18] + CHF







































5 Conclusions

The image and depth enhancements play an important role in nowadays 3D video technologies. Many approaches are proposed to deal with different situations. We present a new robust adaptive method based on the adaptive joint trilateral filter (AJTF) to enhance the image and noisy depth maps. In this paper, we propose an advanced multilateral filter (AMF), which considers the similarities of the spatial, range, depth, and credibility information. The AMF is used for the depth enhancement by suppressing the noise, filling the holes and sharpening the object edges simultaneously. Finally, the proposed method performs the better results than the other method in the experiments.

The proposed AMF without hole filling outperforms the AJTF and the JBF with CHF. The proposed AMF produces sharper object edges and removes overshoot and undershoot artifacts. Besides, the proposed AMF method can remove hole regions and sharpen edges simultaneously. The proposed method replaces the exponential function with the second order Taylor expansion function, which can save 12.69% of the computing time on MATLAB platform. We compare the proposed AMF method with different depth enhancement algorithms; the AMF exhibits better performance in subjective and objective identification.

As a future work, the research direction of the AMF with the hardware VLSI circuits should be considered. In conjunction with DIBR techniques, the edge detection should be more accurate in small object edges such that the DIBR technique requires extremely accurate depth maps. Finally, the proposed method can be extended to depth video enhancement by employing the temporal depth information between successive frames.



This work was supported in part by the National Science Council of Taiwan, under Grant NSC 105-2221-E-006-065-MY3.

Authors’ contributions

TAC carried out the image processing studies, participated in the proposed system design, and drafted the manuscript. YTC carried out the mold design and adjustment parameters. JFY conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

Department of Electrical Engineering, Institute of Computer and Communication Engineering, National Cheng Kung University


  1. L Zhang, C Vazquez, S Knorr, 3D-TV content creation: automatic 2D-to-3D video conversion. IEEE Trans. on Broadcasting 57(2), 372–383 (2011)View ArticleGoogle Scholar
  2. A Smolic, D McCutchen, 3DAV exploration of video-based rendering technology in MPEG. IEEE Trans. on Circuits Syst. Video Technology 14(3), 3448–356 (2004)View ArticleGoogle Scholar
  3. HM Wang, CH Huang, JF Yang, Depth maps interpolation from existing pairs of keyframes and depth maps for 3D video generation. IEEE Circuits and Systems (ISCAS) Conf, 2010, pp. 3248–3251Google Scholar
  4. M Schmeing, X Jiang, Faithful disocclusion filling in depth image based rendering using superpixel-based inpainting. IEEE Trans. on Mutimedia PP(99), 1 (2015)Google Scholar
  5. F Shao, M Yu, G Jiang, F Li, Z Peng, Depth map compression and depth-aided view rendering for a three-dimensional video system. IET Trans. on Signal Process. 6(3), 247–254 (2012)View ArticleGoogle Scholar
  6. TC Yang, PC Kuo, BD Liu, JF Yang, Depth image-based rendering with edge-oriented hole filling for multiview synthesis. Communications, circuits and systems (ICCCAS) Conf. 1, 50–53 (2013)Google Scholar
  7. YS Heo, KM Lee, SU Lee, Robust stereo matching using adaptive normalized cross-correlation. IEEE trans. on pattern analysis and machine intelligence 33(4), 807–822 (2011)View ArticleGoogle Scholar
  8. H Hirschmuller, D Scharstein, Evaluation of cost functions for stereo matching. IEEE Computer Vision and Pattern Recognition (CVPR) conf, 2007, pp. 1–8Google Scholar
  9. D Scharstein, C Pal, Learning conditional random fields for stereo. IEEE Computer Vision and Pattern Recognition (CVPR) conf, 2007, pp. 1–8Google Scholar
  10. F Garcia, D Aouada, T Solignac, B Mirbach, B Ottersten, Real-time depth enhancement by fusion for RGB-D cameras. IET Computer Vision 7(5), 1–11 (2013)View ArticleGoogle Scholar
  11. J Xie, RS Feris, MT Sun, Edge-guided single depth image super resolution. IEEE Trans. on Image Process. 25(1), 428–438 (2016)MathSciNetView ArticleGoogle Scholar
  12. MY Liu, O Tuzel, Y Taguchi, Joint geodesic upsampling of depth images. IEEE Conf. on Computer Vision and Pattern Recognition, 2013, pp. 169–176Google Scholar
  13. J Yang, X Ye, K Li, C Hou, Y Wang, Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. IEEE Trans. on Image Process. 23(8), 3443–3458 (2014)MathSciNetView ArticleGoogle Scholar
  14. Y Shen, J Li, C Lu, Depth map enhancement method based on joint bilateral filter. IEEE Image and Signal Process. Conf, 2014, pp. 153–158Google Scholar
  15. Y Wang, A Ortega, D Tian, A Vetro, A graph-based joint bilateral approach for depth enhancement. IEEE Speech and Signal Process. Conf, 2014, pp. 885–889Google Scholar
  16. J Kopf, MF Cohen, D Lischinski, M Uyttendaele, joint bilateral upsampling. ACM Trans. on Graph 26(3), p.96 (2007) Google Scholar
  17. P Lai, D Tian, P Lopez, Depth map processing with iterative joint multilateral filtering. IEEE Picture Coding Symposium (PCS), 2010, pp. 9–12Google Scholar
  18. SW Jung, Enhancement of image and depth map using adaptive joint trilateral filter. IEEE Trans. on Circuits and Syst. for Video Technology 23, 258–269 (2013)View ArticleGoogle Scholar
  19. H Shan, WD Chien, HM Wang, JF Yang, A homography-based inpainting algorithm for effective depth-image-based rendering”. IEEE Image Processing (ICIP), 2014, pp. 5412–5416Google Scholar
  20. CH Hsia, Improved depth image-based rendering using an adaptive compensation method on an autostereoscopic 3-D display for a Kinect sensor. IEEE Sensors Journal 15(2), 994–1002 (2015)MathSciNetView ArticleGoogle Scholar
  21. P Ndjiki-Nya, M Koppel, D Doshkov, H Lakshman, P Merkle, K Muller, T Wiegand, Depth image-based rendering with advanced texture synthesis for 3-D video. IEEE Trans. on Multimedia 13(3), 453–465 (2011)View ArticleGoogle Scholar
  22. H Chang, DY Yeung, Y Xiong, Super-resolution through neighbor embedding. IEEE Computer Vision and Pattern Recognition 1, I (2004)Google Scholar
  23. NE Yang, YG Kim, RH Park, Depth hole filling using the depth distribution of neighboring region of depth holes in the Kinect sensor. IEEE Signal Process., Communication and Computing Conf, 2012, pp. 658–661Google Scholar
  24. ME Sobel, Asymptotic confidence intervals for indirect effects in structural equation models. Sociological methodology 13, 290–312 (1982)View ArticleGoogle Scholar
  25. TA Chang, JF Yang, Enhancement of depth map using texture and depth consistency. IEEE Conf. (TENCON), 2016, pp. 1139–1142Google Scholar
  26. P Perona, J Malik, Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990)View ArticleGoogle Scholar
  27. K He, J Sun, X Tang, Guided image filtering. IEEE Trans. Pattern Analysis and Machine Intelligence 35(6), 1397–1409 (2013)View ArticleGoogle Scholar
  28. CP Cuong, WJ Jae, Efficient image sharpening and denoising using adaptive guided image filtering. IET Image Process. 9(1), 71–79 (2015)View ArticleGoogle Scholar
  29. A Criminisi, T Sharp, C Rother, P Perez, Geodesic image and video editing. ACM Trans. Graph 29(5), 134 (2010)View ArticleGoogle Scholar
  30. Q Yang, D Li, LH Wang, M Zhang, A novel guided image filter using orthogonal geodesic distance weight. IEEE Image Process. (ICIP) conf, 2013, pp. 1207–1211Google Scholar
  31. SJ Ko, YH Lee, Center weighted median filters and their applications to image enhancement. IEEE Trans. Circuits and Systems 38(39), 984–993 (1991)View ArticleGoogle Scholar
  32. Z Ma, K He, Y Wei, J Sun, E Wu, Constant time weighted median filtering for stereo matching and beyond. IEEE Computer Vision (ICCV) conf, 2013, pp. 49–56Google Scholar
  33. D Gang, ST Acton, On the convergence of bilateral filter for edge-preserving image smoothing. IEEE Signal Process. Letters 14(9), 617–620 (2007)View ArticleGoogle Scholar
  34. G Guarnieri, S Marsi, G Ramponi, Fast bilateral filter for edge-preserving smoothing. Electronics Letters 42(7), 396–397 (2006)View ArticleGoogle Scholar
  35. Z Su, X Luo, Z Deng, Y Liang, Z Ji, Edge-preserving texture suppression filter based on joint filtering schemes”. IEEE Trans. on Multimedia 15(3), 535–548 (2013)View ArticleGoogle Scholar
  36. Q Zhang, L Xu, J Jia, Rolling guidance filter. European Computer Vision Conf. (ECCV), 2014View ArticleGoogle Scholar
  37. D Scharstein, R Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithm. Int. J. Comput. Vision 47(1-3), 7–42 (2002)View ArticleMATHGoogle Scholar
  38. D Scharstein, R Szeliski, High-accuracy stereo depth maps using structured light. IEEE Computer Vision and Pattern Recognition (CVPR) conf 1, I-195-I-202 (2003)Google Scholar
  39. B Ham, D Min, S Sohn, Depth superresolution by transduction. IEEE Trans. on Image Process. 24(5), 1524–1535 (2015)MathSciNetView ArticleGoogle Scholar
  40. L Dai, H Wang, X Mei, X Zhang, Depth map upsampling via compressive sensing. IEEE Asian Conference Pattern Recognition (ACPR), 2013, pp. 90–94Google Scholar
  41. D Kim, D Min, J Oh, S Jeon, K Sohn, Depth map quality metric for three-dimensional video. IS&T/SPIE Electronic Imaging Conf, 2009, pp. 723719–723719Google Scholar


© The Author(s). 2017