### 3.1 Advanced multilateral filter

As shown in Fig. 1, the proposed depth enhancement system is composed of two major steps, the advanced multilateral filter (AMF) and the rotating counsel filter (RCR). We assume that *g*(*x*, *y*) and *d*(*x*, *y*) with the same spatial resolution represent the original image and the corresponding depth map, respectively. The depth map, which could be captured and upsampled from a stereo camera or estimated by a stereo matching method, often comes with noise and holes. The proposed AMF enhanced results *g’*(*x*, *y*) and *d’* (*x*, *y*) are respectively given as

$$ {g}^{\prime}\left( x, y\right)={\displaystyle \sum_{i, j\in \varOmega} h\left( x, y; i, j\right) g\left( i, j\right)} $$

(1)

and

$$ {d}^{\prime}\left( x, y\right)={\displaystyle \sum_{i, j\in \varOmega} h\left( x, y; i, j\right) d\left( i, j\right)} $$

(2)

where the response *h*(*x*, *y*; *i*, *j*) at the position (*x*, *y*) with respect to the impulse at (*i*, *j*), is defined by,

$$ h\left( x, y; i, j\right)=\left\{\begin{array}{l}\frac{J_s{J}_d{J}_g{J}_c}{q_{x, y}}.\kern1em \mathrm{if}\ \left( i, j\right)\in \varOmega \\ {}0,\kern5em \mathrm{otherwise}.\end{array}\right. $$

(3)

In (3), *h*(*x*, *y*; *i*, *j*) is the adaptive multilateral filter, which is used to enhance the noisy depth map, Ω is a selected filtering window, (*x*, *y*) is the coordinate of the center position of the window, and (*i*, *j*) are the neighbor positions of (*x*, *y*). *J*
_{
s
}, *J*
_{
d
}, *J*
_{
g
}, and *J*
_{
c
} are referred to spatial, depth, range, and credibility filtering coefficients, which are respectively defined as

$$ {J}_s= \exp \left(-\frac{{\left( x- i\right)}^2+{\left( y- j\right)}^2}{2{\sigma_s}^2}\right) $$

(4)

$$ {J}_d= \exp \left(-\frac{{\left( d\left( i, j\right)- d\left( x, y\right)\right)}^2}{2{\sigma_d}^2}\right) $$

(5)

$$ {J}_g= \exp \left(-\frac{{\left( g\left( i, j\right)- g\left( x, y\right)\right)}^2}{2{\sigma_g}^2}\right) $$

(6)

$$ {J}_c=1- \exp \left(-\frac{c\left( i, j\right)\times d\left( i, j\right)}{2{\sigma_c}^2}\right) $$

(7)

where *J*
_{
s
} is the weight of the depth distance between center position and its corresponding neighbor position, *J*
_{
d
} is the weight of the depth difference between center position and its corresponding neighbor position, *J*
_{
g
} is the weight of the texture difference between center position and its corresponding neighbor position, *J*
_{
c
} is the weight of the enhancement of the depth map, which is near the texture image edge. In (3), the normalization factor is given as

$$ {q}_{x, y}={\displaystyle {\sum}_{i, j\in \varOmega}{J}_s{J}_d{J}_g{J}_c}. $$

(8)

In (7), the credibility map, *c*(*x*, *y*) is computed from texture image as

$$ c\left( x, y\right)=\left\{\begin{array}{l}1,\begin{array}{cc}\hfill \hfill & \hfill G\left( x, y\right)\ge \phi \hfill \end{array}\\ {}0,\begin{array}{cc}\hfill \hfill & \hfill G\left( x, y\right)<\phi \hfill \end{array}\end{array}\right. $$

(9)

where *ϕ* is a selected threshold and *G*(*x*, *y*) is the magnitude of gradient of texture image as

$$ G\left( x, y\right)=\sqrt{{G_x}^2\left( x, y\right)+{G_y}^2\left( x, y\right)}. $$

(10)

In (10), the horizontal and vertical direction gradients, *G*
_{
x
}(*x*, *y*) and *G*
_{
y
}(*x*, *y*) are computed from Sobel operators [24]. According to (9), the credibility map can be determined if the pixel is in smooth or edge region. If the corresponding candidate of *d*(*i*, *j*) is in edge regions, *c*(*i*, *j*) = 1. The corresponding candidate depth, *d*(*i*, *j*) will be strengthened with the weight controlled by (7). The AMF will be given a strong weight by *J*
_{
c
} to enhance *d*(*x*, *y*). On the other hand, if the corresponding candidate of *d*(*i*, *j*) is in smoothing regions, *c*(*i*, *j*) = 0 such that the corresponding candidate depth, *d*(*i*, *j*) is weakened with the weight controlled by (7).

To reduce computation in exponential functions, Taylor expansion formula is used to approximate the exponential function as

$$ {e}^w\approx p(w)=1+ w+\frac{w^2}{2}. $$

(11)

With (3) and (11), the approximated AMF impulse response then becomes

$$ h\hbox{'}\left( x, y; i, j\right)=\left\{\begin{array}{l}\frac{p\left({w}_s\right) p\left({w}_d\right) p\left({w}_g\right) p\left(1-{w}_c\right)}{q_{x, y}^{\prime }}, \mathrm{if}\left( i, j\right)\in \varOmega \\ {}0\kern9.5em ,\mathrm{otherwise}\end{array}\right. $$

(12)

where the spatial, depth, range, and credibility filtering coefficients respectively become

$$ {w}_s=-\frac{{\left( x- i\right)}^2-{\left( y- j\right)}^2}{2{\sigma_s}^2} $$

(13)

$$ {w}_d=-\frac{{\left( d\left( i, j\right)- d\left( x, y\right)\right)}^2}{2{\sigma_d}^2} $$

(14)

$$ {w}_g=-\frac{{\left( g\left( i, j\right)- g\left( x, y\right)\right)}^2}{2{\sigma_g}^2} $$

(15)

$$ {w}_c=-\frac{c\left( i, j\right)\times d\left( i, j\right)}{2{\sigma_c}^2} $$

(16)

and

$$ {q}_{x, y}^{\prime }={\displaystyle {\sum}_{i, j\in \varOmega} p\left({w}_s\right) p\left({w}_d\right) p\left({w}_g\right) p\left({w}_c\right)}. $$

(17)

It is noted that we need to determine four standard deviations, σ_{
s
}, σ_{
d
}, σ_{
g
}, and σ_{
c
} to achieve the best enhancement of depth map, where the mold matching technique is used for the selection of AMF parameters.

### 3.2 Mold matching for image and depth map

The mold is used to match image and corresponding depth map. In this paper, as shown in Fig. 2, there are 56 binary molds, *M*
_{
m
}, for *m* = 1, 2, …, 56 for mold classification of image and depth blocks. The designed 11 × 11 molds could cover all possible edges and corners of the blocks of image and depth map.

Let *M*
_{
m
} represent the *m*th mold and *I*
_{g} be the best mold index when \( {a}_n^{\prime } \) be the smallest *a*
_{
n
} for the 11 × 11 block of *g*. The computation for finding the best matching mold can be expressed as

$$ {a}_n^{\prime }= \min \left\{{a}_n\right\},\kern1em n=1,2,\dots, {N}_R $$

(18)

where

$$ {a}_n={\displaystyle {\sum}_{m\in {R}_n^0}{\left( g\left({x}_m\right)-{\eta}_n^0\right)}^2}+{\displaystyle {\sum}_{m\in {R}_n^1}{\left( g\left({x}_m\right)-{\eta}_m^1\right)}^2} $$

(19)

and

$$ {\eta}_n^k=\frac{{\displaystyle {\sum}_{m\in {R}_n^k} g\left({x}_m\right)}}{\left|{R}_n^k\right|},\kern0.75em k=0,1. $$

(20)

In (18), *a*
_{
n
} denotes the matching error between the *n*th mold and the image block. Thus, the minimum of *a*
_{
n
} represents the best matching mold to the image block among all the candidate molds. *N*
_{
R
}, which is 56, is the number of total molds, \( {R}_n^0 \) and \( {R}_n^1 \) respectively represent the black and white regions in the *n*th mold as shown in Fig. 2. With *k* = 0 or 1, \( {\eta}_j^k \) is the average of texture values in \( {R}_n^k \), and \( \left|{R}_n^k\right| \) denotes the number of elements in \( {R}_n^k \). In (19), we use the least squares error method to predict the best mold, \( {M}_{I_g} \) for the image block. To find the best mold for the depth block, \( {M}_{I_d} \), we can simply replace *g*(*x*
_{
m
}) with *d*(*x*
_{
m
}) in (19) and (20). In addition, if the block variance is less than a given threshold, e.g., 1, we would assume that the corresponding block belongs to the smooth region. In this case, a new binary mold can be assigned by consisting of all elements with 1’s or 0’s, denoted by *M*
_{0}.

By comparing similarity of the best molds of depth map and image blocks, the sum of absolute differences (SAD) is used to calculate the discrepancy, and the local similarity is measured by the mold matching distortion, *D*
_{
pm
}, as:

$$ {D}_{pm}=\frac{ \min \left( SAD\left({M}_{I_d},{M}_{I_g}\right), SAD\left(\overline{M_{I_d}},{M}_{I_g}\right)\right)}{D_{\max }} $$

(21)

where

$$ SAD\left( a, b\right)={\displaystyle {\sum}_{m\in B}\left| a(m)- b(m)\right|}. $$

(22)

The *SADs* between two binary molds represents the total number of mismatched pixels. The *SAD*s of the mold and its binary inversion of the depth map comparing to the image mold, which are denoted {\( {M}_{I_d},{M}_{I_g} \)} and {\( {\overline{M}}_{I_d},{M}_{I_g} \)}, are respectively computed. Then, the minimum *SAD* is used to represent the mold similarity. It is worth noting that \( SAD\left(\overline{M_{I_d}},{M}_{I_g}\right) \) is necessary because the binary mold only classifies the block pixels into two different groups. If we reverse bits in all molds, the mold matching processes defined in (18), (19), and (20) will achieve the same results. It means texture image and depth map corresponding to each binary mold with all the black region and white region swap, the similarity between the two molds will not be changed. Since the smallest *SAD* will be generated by *D*
_{max}, *D*
_{max} denotes the largest *SAD* between any two molds, the mold similarity after normalization of *D*
_{max}, the value of *D*
_{
pm
} is between 0 and 1. In Fig. 2, by comparing them one-by-one, we found that the maximum *SAD* is *D*
_{max} = 83. The value of *D*
_{max} is defined as the largest difference between the image mold and the depth mold.

According to matching of depth map and image, we can use *σ*
_{
g
}, *σ*
_{
d
} and *σ*
_{
c
} to adjust the influence of the range, depth and credibility filters. Thus, three standard deviations according to *D*
_{
pm
} are given as:

$$ {\sigma}_g= \max \left({\sigma}_{g, L}, \min \left({\sigma}_{g, U},{k}_1\cdot {D}_{pm}\right)\right) $$

(23)

$$ {\sigma}_d= \max \left({\sigma}_{d, L}, \min \left({\sigma}_{d, U},{k}_2\cdot {D}_{pm}\right)\right) $$

(24)

and

$$ {\sigma}_c= \max \left({\sigma}_{c, L}, \min \left({\sigma}_{c, U},{k}_3\cdot {D}_{pm}\right)\right) $$

(25)

where *σ*
_{
g,L
}(*σ*
_{
d,L
}, *σ*
_{
c,L
}) and *σ*
_{
g,U
}(*σ*
_{
d,U
}, *σ*
_{
c,U
}) denote the lower and upper limits, respectively. Thus, for the AMF, we can linearly increase or decrease *k*
_{1}, *k*
_{2} and *k*
_{3} to adjust the strong or weak influence of *D*
_{
pm
}.

### 3.3 Rotating counsel refinement for depth map

After the AMF enhancement, the tiny jagged edges will produce some errors in the synthesis view of the DIBR technology, for example, the boundary of the object is extended to the wrong region. Therefore, the RCR method [25] is used to adjust the object edge of the enhanced depth map. Thus, there exist several algorithms can effectively detect edges and eliminate jagged edges [26], such as guided filter [27, 28], geodesic filters [29, 30], weighted median filters [31, 32], and bilateral filter [33,34,35]. In this paper, we suggest the rotating counsel refinement (RCR), the filtering, is used to remove the tiny jagged edge of enhanced depth maps. The RCR process is implemented in an iterative manner [36], where the iterative RCR is composed of two major steps, including small structure smoothing and edge recovery as illustrated in Fig. 3. The RCR method uses the Gaussian filter to smooth the enhanced depth map, the enhanced depth map is called the guided depth map after the Gaussian filter, then the guided depth map is used to iterate the original enhanced depth map and sharpen the tiny jagged edges.