# Robust stereo matching with trinary cross color census and triple image-based refinements

- Ting-An Chang
^{1}, - Xiao Lu
^{1}and - Jar-Ferr Yang
^{1}Email authorView ORCID ID profile

**2017**:27

https://doi.org/10.1186/s13634-017-0462-3

© The Author(s). 2017

**Received: **14 September 2016

**Accepted: **17 March 2017

**Published: **31 March 2017

## Abstract

For future 3D TV broadcasting systems and navigation applications, it is necessary to have accurate stereo matching which could precisely estimate depth map from two distanced cameras. In this paper, we first suggest a trinary cross color (TCC) census transform, which can help to achieve accurate disparity raw matching cost with low computational cost. The two-pass cost aggregation (TPCA) is formed to compute the aggregation cost, then the disparity map can be obtained by a range winner-take-all (RWTA) process and a white hole filling procedure. To further enhance the accuracy performance, a range left-right checking (RLRC) method is proposed to classify the results as correct, mismatched, or occluded pixels. Then, the image-based refinements for the mismatched and occluded pixels are proposed to refine the classified errors. Finally, the image-based cross voting and a median filter are employed to complete the fine depth estimation. Experimental results show that the proposed semi-global stereo matching system achieves considerably accurate disparity maps with reasonable computation cost.

### Keywords

TCC census transform Cost aggregation Range winner-take-all Image-based refinements## 1 Introduction

The measure of the distance of the scene for robotic systems [1, 2], self-directed vehicles [3], or 3D video broadcasting systems [4, 5] is an important research topic in computer vision. For 3D video broadcasting, a small number of selected views, which include the color texture frames and gray depth maps, are coded by the 3D-HEVC coders [6, 7]. In the receivers, the 3D TV set decodes all texture frames and depth maps with the 3D-HEVC decoder and use a depth image-based rendering (DIBR) system to generate more virtual views for naked-eye multi-view 3D displays [8, 9]. In case that the users possess the naked-eye multi-view 3D displays, the side-by-side or top-and-bottom stereo packing formats should further involve not only real-time stereo matching to estimate the depth information but also these displays which also need the depth image-based rendering (DIBR) process to produce the multi-view synthesized videos. Due to the high computation of stereo matching, a simple and accurate stereo matching algorithm is needed for multi-view 3D displays. Physically, the depth map could be measured by various sensors, such as laser or infrared radar by using the concept of time-of-flight to obtain accurate depth information but with disadvantages of low resolution and high cost. With multiple cameras [10, 11], the stereo vision technologies [12–14] to extract the depth information become a low-price and high-resolution approach. With horizontally placed cameras, the distance estimation of each pixel, called stereo matching, searches the best correspondence of the same scene point in two different viewing images [15, 16]. The horizontal displacement of the paired pixels in two viewing images is called the disparity. If the parameters of capturing cameras are known, the disparity map can be easily transformed to distance (depth) information.

Stereo matching, which is an active research topic in computer vision, could estimate a dense disparity map from a pair of images if their inherent ambiguities can be properly resolved. How to accurately estimate the disparity map under different scene conditions, such as smooth regions, discontinuities, and occluded areas, is the most difficult problem. A survey of stereo matching was conducted by Scharstein and Szeliski [17]. Two well-known global stereo matching approaches, belief propagation [18] and graph cut [19], can produce high-quality disparity maps but require very high computational complexity. Therefore, several semi-global or local stereo methods are generally proposed to achieve efficient implementation [20–23]. However, these semi-global local stereo matching methods still cannot totally solve ambiguity problems, which could come from census transform [21, 22, 24] and local support windows [23]. There are still three main problems need to be solved to improve the precisions for the semi-global stereo matching methods. The determinations of size and shape of local support window should adaptively include more reliable pixels. The sensitivity of intensity in the census transform should be reduced in flat regions that small variations could introduce salt-and-pepper noise in matching cost. Besides, the regular refinement after left-right consistency check cannot unravel the occlusion problems.

To achieve high-precision stereo matching, we propose a semi-global stereo matching system with the trinary cross color (TCC) census transform to reduce sensitivity in smooth region, the two-pass cost aggregation (TPCA) to obtain stable cost, the range winner-take-all (RWTA) to select the robust depth, and the range left-right check (RLRC) to keep the reliable depth. Finally, the triple image-based refinements are also used to further improve the performances. The TPCA combines data term and smooth term together in order to achieve accurate disparity maps in smooth areas and precise object boundaries. The data term is based on the proposed TCC census, which makes raw matching have a better performance than the AD census but with less computation time. A modified RLRC and triple image-based refinements further achieve high-accuracy performance. In this paper, we propose a semi-global stereo matching system based on several techniques, including the TCC census, TPCA, RWTA, and RLRC methods as well as image-based refinements to achieve high-precision depth estimation. The rest of this paper is organized as follows. In Section 2, we first define the stereo matching notations and give a brief overview of the proposed stereo matching system. The details of the framework are described in Section 3. Experimental results to demonstrate the effectiveness of the proposed algorithms are shown in Section 4. Finally, we conclude this paper in Section 5.

## 2 Local census stereo matching methods

With the rectified left and right *W* × *H* color images, with pixels \( {I}_c^l\left( x, y\right) \) and \( {I}_c^r\left( x, y\right) \), as the inputs of the system. For simplicity, let *p* = (*x*, *y*) indicate the spatial location of the pixel; the left and right images can be simply denoted as \( {I}_c^l(p) \) and \( {I}_c^r(p) \), respectively. For stereo matching, the disparity *d* should be estimated such that \( {I}_c^l\left( x, y\right) \) and \( {I}_c^r\left( x+ d, y\right) \) become the stereo matched paired pixels, which are also respectively denoted as \( {I}_c^l(p)={I}_c^l\left(\left( x, y\right)\right) \) and \( {I}_c^r\left( p, d\right)={I}_c^r\left(\left( x+ d, y\right)\right) \) for simplicity. For all *W* × *H* pixels, we need to compute all the *W* × *H* disparity values, which are formed as the *W* × *H* disparity map.

where *p* and *q*, respectively, denote the positions of the central and surrounding pixels in a selected window *N*(*p*), while *I*(*p*) and *I*(*q*) represent their corresponding intensities of the pixels. The census transform is robust to radiometric distortions and achieves good overall performance in cost representation. However, the census is very sensitive in the flat region that makes the salt-and-pepper noise in matching cost. Besides, the census is obtained from square windows, which could overlay the occlusion areas and expand the boundaries of objects.

## 3 The proposed stereo matching system

### 3.1 Trinary cross color (TCC) census

*ρ*is a selected threshold for reducing the noisy effect and should be proportional to

*I*(

*p*). Figure 3 shows how trinary census works well under noise environments. In the smooth regions, the neighboring pixels show the same intensity should have zero census bits as shown in Fig. 3a. Under noisy environment, the original binary census transform yields very different encoded bits as shown in Fig. 3b, while the trinary census transform produces more consistent encoded bits with only one error as shown in Fig. 3c. Hence, the trinary census transform is more robust to errors than the original one.

*R*,

*G*, and

*B*channels is the most primitive information that we can obtain directly from images; the color similarity

*ΔI*

_{ c }(

*p*,

*d*) between the pixe1 at

*p*in the left image \( {I}_c^l(p) \), and the pixel at

*p*with disparity

*d*of the right image \( {I}_c^r\left( p, d\right) \), can be represented as

*c*is the color channel index of the images. The color similarity stated in (4) is insufficient in the raw stereo matching cost for smoothness areas where the census is sensitive. Thus, we use color similarity to detect if we need to use the TCC census cost, which is computed by Hamming distance between TCC census transforms of the pixe1 at

*p*in the left image, \( {I}_c^l(p) \) and the pixel at

*p*with disparity

*d*in the right image, \( {I}_c^r\left( p, d\right) \). Thus, the proposed trinary cross color (TCC) census transform cost after normalization is defined as

where \( {B}_{TC}^l(p) \) and \( {B}_{TC}^r\left( p, d\right) \) are the bit strings of the TCC census transforms of the pixel *p* in the left and right images, respectively. *d* is the disparity with respect to the pixel *p*, *T*
_{1} is a threshold to limit the TCC cost, and *M* is the number of bits in the census window. As shown in Fig. 4d, for example, *M* = 16 (pixels) × 2 (bits/pixel) = 32.

### 3.2 Smooth processes and cost aggregation

To achieve the semi-global fashion, we first propose to add the smoothness items in row and column directions according to the characteristics of results of initial disparity data items to form a new tectonic energy function model. Then, two levels that improved cross-based cost aggregation based on adaptive support weight are performed to improve the accuracy of disparity map. The smooth terms in the row and column directions could reduce the overall matching error rates, and the modified two-pass cross-based adaptive support weight cost aggregation produces a robust rough disparity maps.

#### 3.2.1 Smooth term computations

The horizontal and vertical direction smooth terms are used to overcome the matching cost errors caused by TCC census raw matching cost.

*x*and vertical

*y*of the image versus disparity

*d*search range. For semi-global disparity estimation, the aim of this disparity space model is first to find the position of minimum disparity cost in horizontal direction from

*x*= 1 to

*x*=

*W*by using horizontal iterative smooth term. As shown in Fig. 5, starting at horizontal

*x*= 1, we could find the minimum raw matching cost

*C*

_{TCC}positions from vertical

*y*= 1to

*y*=

*H*. The initial horizontal smooth term at

*x*= 1 is set as

*x*∈ [2,

*W*], we can iteratively compute them as

*x*∈ [2, W],

*y*∈ [1, H], where the horizontal disparity penalty is given by

and *λ* is the smooth term parameter; if the value of *λ* is increased, the occlusion and wrong disparity areas shrink apparently between them, and vice versa.

*y*= 1, the vertical smooth term is set as

*y*∈ [2, H], the iterative computation can be given by

*x*∈ [1, W],

*y*∈ [2, H], where the vertical disparity penalty term is expressed as,

After horizontal and vertical smooth processes, the noises of the disparity map with the TCC census cost can be reduced obviously. Thus, instead of the original TCC census cost *C*
_{TCC}(*p*, *d*), the smooth result \( {C}_{\mathrm{smooth}}^v\left( p, d\right) \) will be used for stereo matching.

#### 3.2.2 Two-pass cost aggregation

*p*is constructed by considering two measures to find the endpoint pixels of left, right, up, and down arms. The color similarity

*ΔI*

_{ c }(

*p*) in RGB space is defined as

*ΔI*

_{ s }(

*p*) is given by

*p*is the central pixel for cross-based window generation and

*I*

_{ c }is the color intensity of the pixel, where

*c*denotes the

*R*,

*G*, or

*B*color index. In (12) and (13),

*i*∈ [1,

*L*],

*L*is the maximum arm length of the cross window. We set the span of left arm

*r*

_{ l }as an example. The computation of

*r*

_{ l }can be formulated as follows:

*p*

_{ i }= (

*x*−

*i*,

*y*) and

*δ*(

*p*,

*p*

_{ i }) are indicators by gaging color similarity and spatial distance between the pixel

*p*, and

*p*

_{ i }as

*τ*

_{ k }and

*L*

_{ k }with

*k*= {1, 2} are the

*k*th level color similarity threshold and spatial distance threshold, respectively, where

*L*

_{1}<

*L*

_{2}and

*τ*

_{1}>

*τ*

_{2}. After the cross arm construction, the support region for pixel

*p*is developed by merging the horizontal arms of all pixels lying on the vertical arms of

*p*(

*q*for example) as shown in Fig. 6. The proposed two-pass decision cross window allows a more flexible control on the arm length. A larger

*L*

_{2}contains more pixels for smooth regions but with a stricter

*τ*

_{2}to guarantee that the arm contains the very similar color regions.

*p*with disparity

*d*as

where Cross_{
p
} denotes the detected cross window around *p* pixel and *γ*
_{
s
} is the parameter. If *γ*
_{
s
} is increased, *C*
_{
ag
} will be increased accordingly. In other words, *C*
_{
ag
} will be weaken if the distance between *q* and *p* pixels is larger in the cross window.

#### 3.2.3 Improved WTA for disparity estimation

*d*and

*N*

_{ d }

*i*s the number of disparity levels sharing the same minimum cost. The suggested initial WTA becomes

*d*}, two {

*d*,

*d*+ 1}, or three consecutive {

*d*− 1,

*d*,

*d*+ 1} disparity levels share with the same minimum cost, the WTA result

*d*(

*p*) =

*d*will be directly adopted in the estimation. However, for

*N*

_{ d }> 3 or non-consecutive disparities with the same minimum cost, we set the pixel at

*p*as an unstable depth as

*d*= 255, which is called as the white hole. In order to fill the white hole, we use cross-based window voting to estimate the disparity as

where *H*
_{
p
}(*d*) is the histogram of the known stable depths in the cross window around *p*, which was obtained from the first cost aggregation. The depth with the highest histogram bin with the value is selected as the most desirable disparity to fill the white hole.

### 3.3 Triple image-based disparity refinements

In order to acquire accurate disparity, we have to detect occluded and mismatched areas and refine them first. The pixels in the reference disparity map must have good correspondence to the pixels in the target disparity map. Otherwise, they must be occluded or mismatched.

#### 3.3.1 Occlusion and discontinuities refinement

*d*

_{ l }(

*x*,

*y*) and

*d*

_{ r }(

*x*,

*y*) be the disparity values in the left and right maps, respectively. The left-right check (LRC) is always used to detect the correct correspondence of the disparities in the left and right depth maps. If the LRC finds

*d*

_{ l }(

*x*,

*y*) =

*d*

_{ r }(

*x*−

*d*

_{ l }(

*x*,

*y*),

*y*), the correct correspondence is detected such that the corresponding disparities will be kept. If the LRC detects

*d*

_{ l }(

*x*,

*y*) ≠

*d*

_{ r }(

*x*−

*d*

_{ l }(

*x*,

*y*),

*y*), we should set the correspondence disparity to be erroneous. To further classify the error pixel as an occluded or mismatched pixel, we further suggest a range LRC as

to detect it for − *d*
_{0} ≤ *σ* ≤ *d*
_{0} with *d*
_{0} ≥ 1, where 255 and 0 denote the mismatched and occluded pixels, respectively. In (21), if the left pixel disparity is equal to the pixel disparity with the disparity shift in the right image, these two paired pixels are treated as the correct correspondence. Thus, we keep the original result. If the left pixel disparity finds a matched disparity with a shift of disparity plus a range of − *d*
_{0} ≤ *σ* ≤ *d*
_{0} in the right image, the erroneous pixel will be marked as mismatched pixel with 255. If the pixel cannot find the matched pixel either with the disparity shift or with a range of disparity shift, we set this pixel as an occluded pixel with 0.

*d*

_{ l }(

*x*,

*y*) is the occluded pixel if its four surrounding pixels have reliable disparities. With iterative refinements, the occluded pixels (black holes) will be successfully refined with the background. For the mismatched pixels (white holes), we use the window voting based on the corresponding color image for the largest proportion stable pixels selection as

*H*

_{ W }(

*d*) is the histogram of the stable and color-matched depths around

*p*in the

*K*×

*K*voting window

*W*, where the color-matched pixel is defined as

*τ*

_{3}is a color similarity threshold, for example, the circled area in Fig. 7 shows the similar color space for the correct pixel voting.

#### 3.3.2 Final disparity map refinement

There are still some noises and wrong disparities in the disparity map. We use the cross-based window voting for the disparity with the maximum number in this area to refine them. The cross window is constructed with the same method in section B, and the disparity of stable pixels with maximum number in this area is selected to replace it. Finally, a 3 × 3 median filter is used to obtain the smoothness disparity map.

## 4 Experimental results

The experimental evaluation of the proposed stereo matching system is performed by using Middlebury datasets [26]. In Section 4.1, we first show the disparity maps achieved by the proposed methods stage by stage to analyze the improvement in each step. In Section 4.2, we then compare the proposed stereo matching system to the other well-known methods. The disparity maps, which are generated by the proposed and compared methods, will be exhibited.

### 4.1 Performance evaluation of the proposed algorithm

The parameters used in the proposed system

Stereo methods | Block | Parameters |
---|---|---|

TCC census | 15 × 15 |
\( \rho \) = 2, |

TPCA | 35 × 35 | { |

Refinement | 25 × 25 | { |

### 4.2 Performance comparisons

Characteristics of Middlebury 2014 stereo datasets

Characteristic | Datasets |
---|---|

Normal | Adirondark, Motorcycle, Piano, Pipes, Playroom, Playtable, Recycle, Shelves, and Teddy |

Light | ArtL, MotorcycleE, and PianoL |

Large disparity | Jadeplant and Vintage |

Angle moving | PlaytableP |

Rank and analyzed performances of the proposed system with normal, light, large disparity, and angle moving conditions

Performances | RMS disparity error | Average absolute error (Avgerr.) | 99% error quantile (A99) | |||
---|---|---|---|---|---|---|

Datasets | RMS | Rank | Avgerr. | Rank | Avgerr. | Rank |

Normal | ||||||

Adirondark | 16.7 | 13 | 6.4 | 12 | 97.6 | 15 |

Motorcycle | 19.8 | 11 | 5.7 | 15 | 121.0 | 9 |

Piano | 9.5 | 4 | 5.1 | 11 | 39.9 | 3 |

Pipes | 32.7 | 15 | 12.6 | 13 | 153.0 | 15 |

Playroom | 14.7 | 3 | 7.0 | 8 | 65.5 | 3 |

Playtable | 19.0 | 3 | 10.1 | 5 | 63.7 | 2 |

Recycle | 10.3 | 9 | 4.9 | 15 | 40.0 | 4 |

Shelves | 21.1 | 12 | 10.6 | 11 | 91.4 | 15 |

Teddy | 7.81 | 7 | 4.1 | 15 | 35.0 | 4 |

Average | 16.8 | 8 | 7.4 | 11 | 78.5 | 7 |

Light | ||||||

ArtL | 47.8 | 20 | 31.1 | 23 | 154.0 | 17 |

MotorcycleE | 78.8 | 23 | 55.7 | 23 | 206.0 | 19 |

PianoL | 61.6 | 22 | 35.4 | 21 | 193.0 | 23 |

Large disparity | ||||||

Jadeplant | 112.0 | 19 | 51.4 | 19 | 444.0 | 21 |

Vintage | 30.8 | 10 | 14.0 | 9 | 96.9 | 6 |

Angle moving | ||||||

PlaytableP | 12.2 | 12 | 7.0 | 16 | 48.1 | 6 |

Average | 57.2 | 17 | 32.5 | 18 | 190.3 | 15 |

## 5 Conclusions

In this paper, a semi-global stereo matching system based on improved TCC census cost, TPCA, and triple image-based refinements is proposed. The TPCA combines data term and smooth term together in order to achieve accurate disparity maps in smooth areas and precise object boundaries. The data term is based on the proposed trinary cross color (TCC) census, which makes raw matching have a better performance than the AD census but with less computation time. The TPCA method with the smooth term iteratively removes the noise caused by TCC census raw matching. The cross-based cost aggregation with two-pass and adaptive support weights is performed to make accurate results in the same color areas. A modified range left-right check (RLRC) and multi-step refinements further achieve high-accuracy performance. The detection of the occluded and mismatched pixels helps us to apply the corresponding method to refine them. Several extended experimental results based on multiple stereo pairs prove the efficiency of the proposed approach compared to the related corresponding method with respect to disparity estimation problems. Two steps of disparity estimation and disparity map refinement increase computational cost mainly caused by cost aggregation in multiple loops. However, the proposed TCC census, TPCA, and triple image-based refinements help to achieve more accurate disparity map estimation in comparison with other related methods. For real-time applications, the GPU or VLSI implementation of the system should be further studied. In addition, the improvement of the subpixel level accuracy of depth estimation could be also investigated to attain better virtual view syntheses and possibly be used for the free-view 3D video generation.

## Declarations

### Funding

This work was supported in part by the National Science Council of Taiwan, under Grant MOST 105-2221-E-006 -065 -MY3.

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

TAC carried out the image processing studies, participated in the proposed system design, and drafted the manuscript. XL carried out the figure design and adjustment parameters. JFY conceived of the study and participated in its design and coordination and helped in drafting the manuscript. All authors read and approved the final manuscript.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- C Faria, W Erlhagen, M Rito, E Demomi, G Ferrigno, E Bicho, Review of robotic technology for stereotactic neurosurgery. IEEE Trans on Biomedical Engineering
**8**, 125–137 (2015)View ArticleGoogle Scholar - G Gioioso, G Salvietti, M Malvezzi, D Prattichizzo, Mapping synergies from human to robotic hands with dissimilar kinematics: an approach in the object domain. IEEE Trans on Robotics
**29**(4), 825–837 (2013)View ArticleGoogle Scholar - ZF Zhang, YC Xu, JF Liu, Design of intelligent vehicle control system of self-directed. IEEE Control Decis Conf 2748–2751 (2012)Google Scholar
- J Cai, Integration of optical flow and dynamic programming for stereo matching. IET Image Process
**6**(3), 205–212 (2012)MathSciNetView ArticleGoogle Scholar - HM Wang, YH Chen, JF Yang, A novel matching frame selection method for stereoscopic video generation. IEEE Multimedia and Expo Conf 1174–1177 (2009)Google Scholar
- EG Mora, J Jung, M Cagnazzo, B Pesquetpopescu, Initialization, limitation, and predictive coding of the depth and texture quadtree in 3D-HEVC. IEEE Trans Circuits Systems Video Technol
**24**(9), 1554–1565 (2014)View ArticleGoogle Scholar - G Tech, K Wegner, Y Chen, S Yea,
*3D HEVC Test Model 3*(Document: JCT3VC1005, Geneva, 2013)Google Scholar - QH Nguyen, MN Do, SJ Patel, Depth image-based rendering with low resolution depth. IEEE Image Process Conf 553–556 (2009)Google Scholar
- CH Hsia, Improved depth image-based rendering using an adaptive compensation method on an autostereoscopic 3-D display for a Kinect sensor. IEEE Trans on Sensors
**15**(2), 994–1002 (2015)View ArticleGoogle Scholar - YA Sheikh, M Shah, Trajectory association across multiple airborne cameras. IEEE Trans Pattern Anal Mach Intell
**30**(2), 361–367 (2008)View ArticleGoogle Scholar - H Hirschmuller, Stereo vision is structured environments by consistent semi-global matching. IEEE Computer Vision and Pattern Recognition Conf
**2**, 2386–2393 (2006)Google Scholar - SB Kang, R Szeliski, J Chai, Handling occlusions in dense multi-view stereo. IEEE Computer Vision Patter Recognition Conf
**1**, 103–110 (2001)Google Scholar - VQ Dinh, CC Pham, JW Jeon, Matching cost function using robust soft rank transformations. IET Image Process
**10**(7), 561–569 (2016)View ArticleGoogle Scholar - JB Lu, S Rogmans, G Lafruit, F Catthoor, Stream-centric stereo matching and view synthesis: a high-speed approach on GPUs. IEEE Trans Circuits Systems Video Technol
**19**(11), 1598–1611 (2009)View ArticleGoogle Scholar - K Zhang, JB Lu, G Lafruit, Cross-based local stereo matching using orthogonal integral images. IEEE Trans Circuits Syst Video Technol
**19**(7), 1073–1079 (2009)View ArticleGoogle Scholar - A Hosni, M Bleyer, C Rhemann, M Gelautz, C Rother, Real-time local stereo matching using guided image filtering. IEEE Multimedia and Expo Conf 1–6 (2011)Google Scholar
- D Schartein, R Szeliski, A taxonomy and evaluation of dense two-frame stereo
*correspondence*algorithms. Int J Comput Vis**47**(1–3), 7–42 (2002)View ArticleMATHGoogle Scholar - J Sun, N Zheng, H Shum, Stereo matching using belief propagation. IEEE Trans Pattern Anal Mach Intell
**25**(7), 787–800 (2003)View ArticleMATHGoogle Scholar - Y Boykov, O Veksler, R Zabih, Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell
**23**(11), 1222–1239 (2001)View ArticleGoogle Scholar - H Hirschmuller, D Scharstein, Evaluation of stereo matching costs on images with radiometric differences. IEEE Trans Pattern Anal Mach Intell
**31**(9), 1582–1599 (2009)View ArticleGoogle Scholar - M Humenberger, T Engelke, W Kubinger, A census-based stereo vison algorithm using modified semi-global matching and plane fitting to improve matching quality. IEEE Comput Vis Patter Recognition Conf 77–84 (2010)Google Scholar
- KR Bae, HS Son, J Hyun, B Moon, A census-based stereo matching algorithm with multiple sparse windows. IEEE Ubiquitous Future Networks (ICUFN) Conf 240–245 (2015)Google Scholar
- A Fusiello, V Roberto, E Trucco, Efficient stereo with multiple windowing. IEEE Comput Vis Patter Recognition Workshop Conf 858–863 (1997)Google Scholar
- R Zabih, J Woodfill, Non-parametric local transforms for computing visual correspondence. European Comput Vis Conf
**2**, 151–158 (1994)Google Scholar - X Mei, X Sun, MC Zhou, SH Jiao, HT Wang, XP Zhang, On building an accurate stereo matching system on graphics hardware. IEEE Comput Vis Workshops Conf 467–474 (2011)Google Scholar
- http://vision.middlebury.edu/stereo. Accessed 29 Mar 2017, Middlebury stereo vision page [Online]
- J Jiao, R Wang, W Wang, S Dong, Z Wang, W Gao, Local stereo matching with improved matching cost and disparity refinement. IEEE Trans Multimedia
**21**(4), 16–27 (2014)View ArticleGoogle Scholar - D Kim, D Min, J Oh, S Jeon, K Sohn, Depth map quality metric for three-dimensional video. IS&T/SPIE Electronic Imaging Conf 723719 (2009)Google Scholar