 Research
 Open access
 Published:
Efficient 2D to 3D video conversion implemented on DSP
EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 106 (2011)
Abstract
An efficient algorithm to generate threedimensional (3D) video sequences is presented in this work. The algorithm is based on a disparity map computation and an anaglyph synthesis. The disparity map was first estimated by employing the wavelet atomic functions technique at several decomposition levels in processing a 2D video sequence. Then, we used an anaglyph synthesis to apply the disparity map in a 3D video sequence reconstruction. Compared with the other disparity map computation techniques such as optical flow, stereo matching, wavelets, etc., the proposed approach produces a better performance according to the commonly used metrics (structural similarity and quantity of bad pixels). The hardware implementation for the proposed algorithm and the other techniques are also presented to justify the possibility of realtime visualization for 3D color video sequences.
1. Introduction
Conversion of available 2D content for release in threedimensional (3D) is a hot topic for content providers and for success of 3D video in general. It naturally completely relies on virtual view synthesis of a second view given the original 2D video [1]. 3DTV channels, mobile phones, laptops, personal digital assistants and similar devices represent hardware, in which the 3D video content can be applied.
There are several techniques to visualize 3D objects, such as using polarized lens, active vision, and anaglyph. However, some of those techniques have certain drawbacks, mainly the special hardware requirements, such as the special display used with the synchronized lens in the case of active vision and the polarized display in the case of polarized lens. However, the anaglyph technique only requires a pair of spectacles constructed with red and blue filters where the red filter is placed over the left position producing a visual effect of 3D perception. Anaglyph synthesis is a simple process, in which the red channel of the second image (frame) replaces the red channel in the first image (frame) [2]. In the literature, several methods to compute anaglyphs have been described. One of them is the original Photoshop algorithm [3], where the red channel of the left eye becomes the red channel of the anaglyph and vice versa for the blue and green channels of the right eye. Dubois [4] suggested the least square projection in each color component (R, G, B) from R_{6} space to the 3D subspace. Two principal drawbacks of these algorithms are the presence of ghosting and the loss of color [5].
In the 2D to 3D conversion, depth cues are needed to generate a novel stereoscopic view for each frame of an input sequence. The simplest way to obtain 3D information is the use of motion vectors directly from compressed data. However, this technique can only recover the relative depth accurately, if the motion of all scene objects is directly proportional to their distance from the camera [1].
In [6], the motion vector maps, which are obtained from the MPEG4 compression standard, are used to construct the depth map of a stereo pair. The main idea here is to avoid the disparity map stage because it requires extremely computationally intensive operations and cannot suitably estimate the highresolution depth maps in the video sequence applications. In paper [7], a realtime algorithm for use in 3DTV sets is developed, where the general method to perform the 2D to 3D conversion consists of the following stages: geometric analysis, static cues extraction, motion analysis, depth assignment, depth control, and depth image based rendering. One drawback of this algorithm is that it requires extremely computationally intensive operations.
There are several algorithms to estimate the DM such as the optical flow differential methods designed by Lucas & Kanade (L&K) and Horn and Schunk [8, 9], where some restrictions in the motion map model are employed. Other techniques are based on the disparity estimation where the best match between pixels in a stereo pair or neighboring frames is found by employing a similarity measure, for example, the normalized crosscorrelation (NCC) function or the sum of squared difference (SSD) between the matched images or frames [10]. A recent approach called the regionbased stereo matching (RBSM) is presented in [11], where the block matching technique with various window sizes is computed. Another promising framework consists of stereo correspondence estimation based on wavelets and multiwavelets [12], in which the wavelet transform modulus (WTM) is employed in the DM estimation. The WTM is calculated from the vertical and the horizontal detail components, and the approximation component is employed to normalize the estimation. Finally, the cross correlation in wavelet transform space is applied as the similarity measure.
In this article, we propose an efficient algorithm to perform a 3D video sequence from a 2D video sequence acquired by a moving camera. The framework uses the wavelet atomic functions (WAF) for the disparity map estimation. Then, the anaglyph synthesis is implemented in the visualization of the 3D color video sequence on a standard display. Additionally, we demonstrate the DSP implementation for the proposed algorithm with different sizes of the 2D video sequences.
The main difference with other algorithms presented in literature is that the proposed framework performing sufficiently good depth and spatial perception in the 3D video sequences does not require intensive computational operations and can generate 3D videos practically in realtime mode.
In the present approach, we employ the WAFs because they have already demonstrated successful performance in medical image recognition, speech recognition, image processing, and other technologies [13–15].
The article is organized as follows: Section 2 presents the proposed framework, Section 3 contains the simulation results, and Section 4 concludes the article.
2. The proposed algorithm
The proposed framework consists of the following stages: 2D color video sequence decomposition, RGB component separation, DM computation using wavelets at multiple decomposition levels (MW), in particular wavelet atomic functions (MWAF), disparity map improvement via dynamic range compression, anaglyph synthesis employing the nearest neighbor interpolation (NNI), and 3D video sequence reconstruction and visualization. Below, we explain in detail the principal 3D reconstruction stages (Figure 1).
2.1. Disparity map computation
Stereo correspondence estimation based on the MW (MWAF) technique is proposed to obtain the disparity map. The stereo correspondence procedure consists of two stages: the WAF implementation and the WTM computation.
Here, we present a novel type of wavelets known as WAFs, first introducing basic atomic functions (up, fup_{ n }, π_{ n } ) used as the mother functions in wavelet construction. The definition of AFs is connected with a mathematical problem: the isolation of a function that has derivatives with a maximum and minimum similar to those of the initial function. To solve this problem requires an infinitely differentiable solution to the differential equations with a shifted argument [15]. It has been shown that AFs fall within an intermediate category between splines and classical polynomials: like Bsplines, AFs are compactly supported, and like polynomials, they are universal in terms of their approximation properties.
The simplest and most important AF is generated by infinitytoone convolutions of rectangular impulses that are easy to analyze via the Fourier transform. Based on Ntoone convolution of (N + 1) identical rectangle impulses, the compactly supported spline θ_{ N } (x) can be defined as follows:
The function up(x) is represented by the Fourier transform for infinite convolutions of rectangular impulses with variable length of duration 2 ^{k} , as in Equation 2:
The AF fup_{ N } (x) is defined by the convolution of spline θ_{N1}(x) and AF up(x) in the interval [(N+2)/2, (N+2)/2]. Thus, fup_{ N } (x) can be written in the following form:
The generalization of AF up(x) as presented above, the AF up_{ m } (x) is defined as follows:
The function π_{ m } (x) can be represented by the inverse Fourier transform {\pi}_{m}\left(t\right)=\frac{1}{2\pi}\underset{\infty}{\overset{\infty}{\int}}{\mathsf{\text{e}}}^{ixt}{F}_{m}\left(t\right)\mathsf{\text{d}}t using such representation for function F_{ m } (t):
The detailed definitions and properties of these functions can be found in [15].
The wavelet decomposition procedures employ several decomposition levels to enhance the quality of the depth maps. The discrete wavelet transform (DWT) and inverse DWT are usually implemented using the filter bank techniques for a scheme with only two filters: low pass (LP) H(z) (decomposition) and \stackrel{\u0303}{H}\left(z\right) (reconstruction), and high pass (HP) G(z) (decomposition) and \stackrel{\u0303}{G}\left(z\right) (reconstruction), where: G(z) = zH(z) and \stackrel{\u0303}{G}\left(z\right)=z^{1}H(z) [16]. The scale function φ(x) is associated with filter H(z) in accordance to scaling equation: \varphi \left(x\right)=\frac{2}{H\left(1\right)}{\sum}_{k\in Z}{h}_{k}\varphi \left(2xk\right) and can be expressed by it Fourier transform \widehat{\varphi}\left(\omega \right)=\prod _{k=1}^{\infty}\frac{H\left({\mathsf{\text{e}}}^{j\frac{\omega}{{2}^{k}}}\right)}{H\left(1\right)}. The wavelet functions are computed using linear combination of scale functions \psi \left(x\right)=\frac{2}{H\left(1\right)}{\sum}_{k}{g}_{k}\varphi \left(2xk\right),\phantom{\rule{1em}{0ex}}\mathsf{\text{where}}{g}_{k}={\left(1\right)}^{k+1}{h}_{k1}^{*}, and {h_{ k } } are the coefficients of the LP filter in it Fourier series:
and wavelet \stackrel{\u0303}{\psi}\left(x\right)=\frac{2}{\stackrel{\u0303}{H}\left(1\right)}{\sum}_{k}{\stackrel{\u0303}{g}}_{k}\stackrel{\u0303}{\varphi}\left(2xk\right). The HP filter is represented by Fourier series with coefficients {h_{ k } }:
The coefficients {h_{ k } } should satisfy such normalization condition: \frac{1}{\sqrt{2}}{\sum}_{k}{h}_{k}={H}_{0}\left(0\right)=1. Finally, wavelets of decomposition and reconstruction are employed in such a form: {\stackrel{\u0303}{\psi}}_{i,k}={2}^{i\u22152}\stackrel{\u0303}{\psi}\left(x\u2215{2}^{i}k\right) and {\psi}_{i,k}={2}^{i\u22152}\psi \left(x\u2215{2}^{i}k\right), respectively, where i and k are indexes of translation and scale [16].
The procedure to synthesis the WAF consists of performing a scale function φ(x) that should generate the sequence of compact subspaces satisfying such property, each next subspace V_{j+1}is into a previous one V_{ j } : V_{ j } ⊂ L^{2}(X), j ∈ X; ⋃ _{ j }V_{ j } = L^{2}(X); ⋂ _{ j }V_{ j } = {0}; f(x) ∈ V_{ j } ⇔ f(2x) ∈ V_{j+1}. Finally, it should be existed such scale function φ(x) that: (a) with their shifts forms the Riesz bases; (b) it has symmetric and finite Fourier transform \stackrel{\u0303}{\varphi}\left(\omega \right). Because the scale AF φ(x) and WAF ψ(x) are not compactly supported but they rapidly decrease (due to infinite differentiability), it is possible to select an effective support from such limit conditions: ϕϕ_{ef}•100% ≤ 0.001%, ψψ_{ef}•100% ≤ 0.001%. Filter coefficients h_{ k }for the scale function φ(x) generated from different WAFs: up, fup_{ n } , up_{ n } , π_{ n } can be found in [17]. In Table 1, we only present the coefficients h_{ k } for scale function φ(x) generated from AF up, fup_{ 4 } and π_{6} that exposes better perception quality in synthesized 3D images as one see below in simulation results. The effective supports for scale function φ(x) and wavelet ψ(x) generated from used AF are [16, 16].
The Wavelet technique, which the developed method uses, is based on the DWT. In proposed framework for DM estimation, the wavelets on each decomposition level are computed as follows [12]:
where W_{ s } is the wavelet for a chosen decomposition level s; D_{ h, s } , D_{ v, s } , D_{ d, s } are the horizontal, vertical, and diagonal detail components at each a level s, A_{ s } is the approximation component, and θ_{ s } is the phase that is defined as follows:
Once the W_{ s } is computed for each an image stereo pair or neighboring frames for a video, the disparity map for each level of decomposition can be formed using the crosscorrelation function in wavelet transform space:
Co{{r}_{\left(\mathsf{\text{L}}\text{\_}\mathsf{\text{R}}\right),s}}_{}\left(x,y\right)=\sum _{\left(i,j\right)\in P}^{}\frac{{W}_{\mathsf{\text{L}}}\left(i,j\right)\cdot {W}_{\mathsf{\text{R}}}\left(x+i,y+j\right)}{\sqrt[]{{\sum}_{i,j\in P}{W}_{\mathsf{\text{L}}}^{2}\left(i,j\right)\cdot \sum _{i,j\in P}{W}_{\mathsf{\text{R}}}^{2}\left(x+i,y+j\right)}}, (11)
where W_{L} and W_{R} are the wavelet transform for the left and right images in each decomposition level s, and P is sliding processing window. Finally, the disparity map for each level of decomposition is computed by applying the NNI technique. In this work, we propose using four levels of decomposition in DWT.
A block diagram of the proposed MWAF framework is presented in Figure 2.
2.2. Disparity map improvement and anaglyph synthesis
The classical methods used in anaglyph construction can produce ghosting effects and color loss. One way to reduce these artifacts in anaglyph synthesis is to use the dynamic range compression of the disparity map [18]. The dynamic range compression permits retaining the depth ordering information, which reduces the ghosting effects in the nonoverlapping areas in the anaglyph. Therefore, the dynamic range reduction of the disparity map values can be employed to enhance the map quality. Using the P th law transformation for dynamic range compression [18], the original disparity map D is changed as follows:
where D_{new} is the new disparity map pixel value, 0 < a < 1 is a normalizing constant, and 0 < P < 1.
At the final stage, the anaglyph synthesis is performed using the improved disparity map. To generate an anaglyph, the neighboring frames in a grid dictated by the disparity map should be resampled. During numerous simulations, the bilinear, sinc and NNIs were implemented to find an anaglyph with a better 3D perception. The NNI showed a better performance during the simulations and it was sufficiently fast in comparison with the other investigated interpolations. Thus, the NNI was chosen to successfully create the required anaglyph in this application. The NNI is performed for each pair of neighboring frames in the video sequence. NNI [19] that uses this framework changes the values of the pixels to the closest neighbor value. To perform the NNI in the current decomposition level and to form the resulting disparity map, intensity of each pixel is changed. The new intensity value is determined by comparing a pixel in the low resolution disparity from i th decomposition level with the closest pixel value in the actual disparity map from (i  1)th decomposition level.
2.3. DSP implementation
Our study also involved employing the promising 3D visualization algorithms in realtime modes using a DSP. The core of the EVM DM642™ is a digital media processor that is characterized by a large set of integrated features of the card, such as: a TMS320DM642™ DSP at 720 MHz (1.39 instructions per cycle or 570 million instructions per second), 32 Mb of SDRAM, 4 Mb of Linear Memory Flash, 2 video decoders, 1 video coder, FPGA™ implementation to display, double UART with RS232 drivers, several inputoutput video formats and others. The communication between the code composer studio (CCS) and the EVM is achieved with an external emulator via JTAG connectors [20]. Using MATLAB's Simulink™ module, a project was created in which the DSP model and its respective task BIOS were selected. Then, a function is created to contain three sub functions: video capture, 3D video reconstruction using WAF, and the output interface to a video display. Next, a CCS™ project is conducted in Simulink™. During this step in the process, the MATLAB™ module sends a signal to the CCS and creates the project on C. To perform the video sequence processing using the DSP, the MATLAB™ program is first transformed into 'C' code for CCS via Simulink™. Once the CCS project has been created, the necessary changes are made to obtain the processing time values. The corresponding results for the designed and the reference frameworks are presented in the next section. Serial connection of three EVM DM642 is used in this application, where the first and second DSPs compute the disparity maps using MWAF procedure, and the third DSP generates the anaglyph. The developed algorithm in Simulink™ is shown in Figure 3.
3. Simulation results
In the simulation experiments, various synthetic images are used to obtain the quantitative measurements. The synthetic images were obtained from http://vision.middlebury.edu/stereo/data. Aloe, Venus, Lampshade1, Wood1, Bowling1, and Reindeer were the synthetic images used, all in PNG format (480 × 720 pixels). We also used the following test color video sequences in CIE format (250 frames, 288 × 352 pixels): Coastguard, Flowers, and Foreman. The test video sequences were obtained from http://trace.eas.asu.edu/yuv/index.html. In order to use the test color video sequences in the same sizes, we reformatted them in 480 × 720 pixels on Avi format. Additionally, the real life video sequences named Video Test1 (200 frames, 480 × 720 pixels) and Video Test2 (200 frames, 480 × 720 pixels) were recorded to apply the proposed algorithm in a common scenario. Video Test1 shows a truck moving in the scenery and Video Test2 shows three people walking toward the camera. Two quality objective criteria, quantity of bad disparities (QBD) [12] and similarity structure image measurement (SSIM) [21], were chosen as the quantitative metrics to justify the selection of the best disparity map algorithm in the 3D video sequence reconstruction. The QBD values have been calculated for different synthetic images as follows:
where N is the total number of pixels in the input image, and d_{E} and d_{G} are the estimated and the ground truth disparities, respectively.
The SSIM metric values are defined as follows:
where the parameters l, c, and s are calculated according to following equations:
In Equations (15) to (17), X is the estimated image, Y is the ground truth image, μ and σ are the mean value and standard deviation for the X or Y images, and C_{1} = C_{2} = C_{3} = 1.
Table 2 presents the values of QBD and SSIM for the proposed framework based on MWAFs and the other techniques applied to different synthetic images.
The simulation results presented in Table 2 indicate that the best overall performance of disparity map reconstruction is produced by the MWAF framework. The minimum value of QBP and the maximum value of SSIM are obtained when the MWAF π_{6} is used, followed by WAF π_{6}. At the final stage, when the anaglyphs were synthesized, the NCC was calculated in a sliding window with 5 × 5 pixels. The SSD algorithm was implemented in a window of size 9 × 9 pixels. The L&K algorithm was performed according to [9]. For all tested algorithms, the dynamic range compression was applied with the parameters a = P = 0.5. Figure 4 shows the obtained disparity map for all tested images and all implemented algorithms; evidently, the MWAF π_{6} implementation produces the best overall visual results.
Based on the objective quantity metrics and the subjective results presented in Figure 4, MWAF π_{ 6 } has been selected as the technique to estimate the disparity map for video sequence visualization.
The anaglyphs, which were synthesized with the MWAF algorithm, showed sufficiently good 3D visual perception with reduced ghosting and color loss. The spectacles with blue and red filters are required to observe Figures 5 and 6.
Processing time values were computed during the DSP implementation and the Table 3 shows the processing times for the video sequences using Matlab and the serial DSP implementation. Here, the tested video sequences were: Flowers, Coastguard, Video Test1, and Video Test2 (all with 480 × 720 pixels and with 240 × 360 pixels in RGB format).
The processing time values were measured since the moment the sequence was acquired from the DSP until the anaglyph was displayed in a regular monitor.
The processing times in Table 3 lead to a possible conclusion that the DSP algorithm can process up to 20 frames per sec for a frame of 240 × 360 pixels size in RGB format. Additionally, the DSP algorithm can process up to 12 frames per sec for a frame of 480 × 720 pixels size in RGB format. Processing time values for L&K and SSD algorithms implemented in Matlab were 22.59 and 16.26 s, accordingly, because they required extremely computationally intensive operations.
4. Conclusion
This study analyzed the performance of various 3D reconstruction methods. The proposed framework based on MWAFs is the most effective method to reconstruct the disparity map for 3D video sequences with different types of movements. Such framework produces the best depth and the best spatial perception in synthesized 3D video sequences against other analyzed algorithms that is confirmed by numerous simulations for different initial 2D color video sequences. The MWAF algorithm can be applied to any type of color video sequence without additional information. The performance of the DSP implementation shows that the proposed algorithm can practically visualize the final 3D color video sequence in realtime mode. In future, we suppose to optimize the proposed algorithm in order to increase the processing speed up to the film velocity.
Abbreviations
 CCS:

code composer studio
 3D:

threedimensional
 LP:

low pass
 MW:

multiple decomposition levels
 NCC:

normalized crosscorrelation
 QBD:

quantity of bad disparities
 RBSM:

regionbased stereo matching
 SSD:

sum of squared difference: WAF: wavelet atomic functions
 WTM:

wavelet transform modulus.
References
Smolic A, Kauff P, Hnorr S, Hournung A, Kunter M, Muller M, Lang M: Three dimensional video postproduction and processing. Proc IEEE 2011,99(4):607625.
Ideses I, Yaroslavsky L: New methods to produce high quality color anaglyphs for 3D visualization. In ICIAR, Lecture Notes in Computer Science. Volume 3212. Springer Verlag, Germany; 2004:273280. 10.1007/9783540301264_34
Sanders W, McAllister D: Producing anaglyphs from synthetic images. Proceedings of SPIE Stereoscopic Displays and Virtual Reality Systems X 2003, 5006: 348358.
Dubois E: A projection method to generate anaglyph stereo images. In Proceedings of IEEE International Conference on Acoustic Speech Signal Processing. Volume 3. Salt Lake City, USA; 2001:16611664.
Woods A, Rouke T: Ghosting in anaglyphic stereoscopic images. Stereoscopic Displays and Applications XV, Proceedings of SPIEIS&T Electronic Imaging, SPIE 2004, 5291: 354365.
Ideses I, Yaroslavsky L, Fishbain B: 3D from compressed video, in Stereoscopic displays and virtual reality systems. Proc SPIE 2007.,6490(64901C):
Caviedes J, Villegas J: Real time 2D to 3D conversion: Technical and visual quality requirements. International Conference on Consumer Electronics, ICCEIEEE 2011, 897898.
Fleet DJ: Measurement of Image Velocity. Kluwer Academic Publishers, Massachusetts; 1992.
Beauchemin SS, Barron JL: The computation of optical flow. ACM Comput Surv 1995,27(3):433465. 10.1145/212094.212141
Bovik A: Handbook of Image and Video Processing. Academic Press, USA; 2000.
Alagoz BB: Obtaining depth maps from color images by region based stereo matching algorithms. OncuBilim Algor Syst Labs 2008,08(4):112.
Bhatti A, Nahavandi S: Stereo Vision. Volume Chap 6. ITech, Vienna; 2008:2748.
YuV Gulyaev, Kravchenko VF, Pustovoit VI: A new class of WAsystems of KravchenkoRvachev functions in Doklady mathematics. 2007,75(2):325332.
Juarez C, Ponomaryov V, Sanchez J, Kravchenko V: Wavelets based on atomic function used in detection and classification of masses in mammography. Lecture Notes in Artificial Intelligence 2008, 5317: 295304.
Kravchenko V, Meana H, Ponomaryov V:Adaptive Digital Processing of Multidimensional Signals with Applications. FizMatLit Edit, Moscow; 2009. [http://www.posgrados.esimecu.ipn.mx/]
Meyer Y: Ondelettes. Hermann, Paris; 1991.
Kravchenko VF, Yurin AV: New class of wavelet functions in digital processing of signals and images. J Success Mod Radio Electron, Moscow, Edit Radioteknika 2008, 5: 3123.
Ideses I, Yaroslavsky L: Three methods that improve the visual quality of colour anaglyphs. J Opt A Pure Appl Opt 2005, 7: 755762. 10.1088/14644258/7/12/008
Goshtasby A: 2D and 3D Image Registration. Wiley Publishers, USA; 2005.
Texas Instruments: TMS320DM642 Evaluation Module with TVP Video Encoders. Technical Reference 5073450001 Rev B 2004.
Malpica WS, Bovik AC: Range image quality assessment by structural similarity. ICASSP 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE 2009, 11491152.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors thank the National Polytechnic Institute of Mexico and CONACY (Project 81599) for their support of this work
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
RamosDiaz, E., Kravchenko, V. & Ponomaryov, V. Efficient 2D to 3D video conversion implemented on DSP. EURASIP J. Adv. Signal Process. 2011, 106 (2011). https://doi.org/10.1186/168761802011106
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/168761802011106