Skip to main content

Efficient 2D to 3D video conversion implemented on DSP


An efficient algorithm to generate three-dimensional (3D) video sequences is presented in this work. The algorithm is based on a disparity map computation and an anaglyph synthesis. The disparity map was first estimated by employing the wavelet atomic functions technique at several decomposition levels in processing a 2D video sequence. Then, we used an anaglyph synthesis to apply the disparity map in a 3D video sequence reconstruction. Compared with the other disparity map computation techniques such as optical flow, stereo matching, wavelets, etc., the proposed approach produces a better performance according to the commonly used metrics (structural similarity and quantity of bad pixels). The hardware implementation for the proposed algorithm and the other techniques are also presented to justify the possibility of real-time visualization for 3D color video sequences.

1. Introduction

Conversion of available 2D content for release in three-dimensional (3D) is a hot topic for content providers and for success of 3D video in general. It naturally completely relies on virtual view synthesis of a second view given the original 2D video [1]. 3DTV channels, mobile phones, laptops, personal digital assistants and similar devices represent hardware, in which the 3D video content can be applied.

There are several techniques to visualize 3D objects, such as using polarized lens, active vision, and anaglyph. However, some of those techniques have certain drawbacks, mainly the special hardware requirements, such as the special display used with the synchronized lens in the case of active vision and the polarized display in the case of polarized lens. However, the anaglyph technique only requires a pair of spectacles constructed with red and blue filters where the red filter is placed over the left position producing a visual effect of 3D perception. Anaglyph synthesis is a simple process, in which the red channel of the second image (frame) replaces the red channel in the first image (frame) [2]. In the literature, several methods to compute anaglyphs have been described. One of them is the original Photoshop algorithm [3], where the red channel of the left eye becomes the red channel of the anaglyph and vice versa for the blue and green channels of the right eye. Dubois [4] suggested the least square projection in each color component (R, G, B) from R6 space to the 3D subspace. Two principal drawbacks of these algorithms are the presence of ghosting and the loss of color [5].

In the 2D to 3D conversion, depth cues are needed to generate a novel stereoscopic view for each frame of an input sequence. The simplest way to obtain 3D information is the use of motion vectors directly from compressed data. However, this technique can only recover the relative depth accurately, if the motion of all scene objects is directly proportional to their distance from the camera [1].

In [6], the motion vector maps, which are obtained from the MPEG4 compression standard, are used to construct the depth map of a stereo pair. The main idea here is to avoid the disparity map stage because it requires extremely computationally intensive operations and cannot suitably estimate the high-resolution depth maps in the video sequence applications. In paper [7], a real-time algorithm for use in 3DTV sets is developed, where the general method to perform the 2D to 3D conversion consists of the following stages: geometric analysis, static cues extraction, motion analysis, depth assignment, depth control, and depth image based rendering. One drawback of this algorithm is that it requires extremely computationally intensive operations.

There are several algorithms to estimate the DM such as the optical flow differential methods designed by Lucas & Kanade (L&K) and Horn and Schunk [8, 9], where some restrictions in the motion map model are employed. Other techniques are based on the disparity estimation where the best match between pixels in a stereo pair or neighboring frames is found by employing a similarity measure, for example, the normalized cross-correlation (NCC) function or the sum of squared difference (SSD) between the matched images or frames [10]. A recent approach called the region-based stereo matching (RBSM) is presented in [11], where the block matching technique with various window sizes is computed. Another promising framework consists of stereo correspondence estimation based on wavelets and multi-wavelets [12], in which the wavelet transform modulus (WTM) is employed in the DM estimation. The WTM is calculated from the vertical and the horizontal detail components, and the approximation component is employed to normalize the estimation. Finally, the cross correlation in wavelet transform space is applied as the similarity measure.

In this article, we propose an efficient algorithm to perform a 3D video sequence from a 2D video sequence acquired by a moving camera. The framework uses the wavelet atomic functions (WAF) for the disparity map estimation. Then, the anaglyph synthesis is implemented in the visualization of the 3D color video sequence on a standard display. Additionally, we demonstrate the DSP implementation for the proposed algorithm with different sizes of the 2D video sequences.

The main difference with other algorithms presented in literature is that the proposed framework performing sufficiently good depth and spatial perception in the 3D video sequences does not require intensive computational operations and can generate 3D videos practically in real-time mode.

In the present approach, we employ the WAFs because they have already demonstrated successful performance in medical image recognition, speech recognition, image processing, and other technologies [1315].

The article is organized as follows: Section 2 presents the proposed framework, Section 3 contains the simulation results, and Section 4 concludes the article.

2. The proposed algorithm

The proposed framework consists of the following stages: 2D color video sequence decomposition, RGB component separation, DM computation using wavelets at multiple decomposition levels (M-W), in particular wavelet atomic functions (M-WAF), disparity map improvement via dynamic range compression, anaglyph synthesis employing the nearest neighbor interpolation (NNI), and 3D video sequence reconstruction and visualization. Below, we explain in detail the principal 3D reconstruction stages (Figure 1).

Figure 1
figure 1

The proposed framework.

2.1. Disparity map computation

Stereo correspondence estimation based on the M-W (M-WAF) technique is proposed to obtain the disparity map. The stereo correspondence procedure consists of two stages: the WAF implementation and the WTM computation.

Here, we present a novel type of wavelets known as WAFs, first introducing basic atomic functions (up, fup n , π n ) used as the mother functions in wavelet construction. The definition of AFs is connected with a mathematical problem: the isolation of a function that has derivatives with a maximum and minimum similar to those of the initial function. To solve this problem requires an infinitely differentiable solution to the differential equations with a shifted argument [15]. It has been shown that AFs fall within an intermediate category between splines and classical polynomials: like B-splines, AFs are compactly supported, and like polynomials, they are universal in terms of their approximation properties.

The simplest and most important AF is generated by infinity-to-one convolutions of rectangular impulses that are easy to analyze via the Fourier transform. Based on N-to-one convolution of (N + 1) identical rectangle impulses, the compactly supported spline θ N (x) can be defined as follows:

θ N x = 1 2 π - e j u x sin u 2 u 2 N + 1 d u .

The function up(x) is represented by the Fourier transform for infinite convolutions of rectangular impulses with variable length of duration 2 -k , as in Equation 2:

u p x = 1 2 π - e j u x k = 1 sin u 2 - k u 2 - k d u .

The AF fup N (x) is defined by the convolution of spline θN-1(x) and AF up(x) in the interval [-(N+2)/2, (N+2)/2]. Thus, fup N (x) can be written in the following form:

f u p N x = - e j u x sin u 2 u 2 N k = 1 sin u 2 - k u 2 - k d u , f u p 0 x u p x .

The generalization of AF up(x) as presented above, the AF up m (x) is defined as follows:

u p m ( x ) = 1 2 π - e j x u k = 1 sin 2 m u ( 2 m ) k m u ( 2 m ) k m sin u ( 2 m ) k d u , m = 1 , 2 , 3 , u p 1 ( x ) = u p ( x ) .

The function π m (x) can be represented by the inverse Fourier transform π m t = 1 2 π - e i x t F m t d t using such representation for function F m (t):

F m t = k = 1 m sin 2 m - 1 t + V = 2 M - 1 v sin 2 m - 2 v + 1 t 3 m - 2 t .

The detailed definitions and properties of these functions can be found in [15].

The wavelet decomposition procedures employ several decomposition levels to enhance the quality of the depth maps. The discrete wavelet transform (DWT) and inverse DWT are usually implemented using the filter bank techniques for a scheme with only two filters: low pass (LP) H(z) (decomposition) and H ̃ ( z ) (reconstruction), and high pass (HP) G(z) (decomposition) and G ̃ ( z ) (reconstruction), where: G(z) = zH(-z) and G ̃ ( z ) =z-1H(-z) [16]. The scale function φ(x) is associated with filter H(z) in accordance to scaling equation: ϕ ( x ) = 2 H ( 1 ) k Z h k ϕ ( 2 x - k ) and can be expressed by it Fourier transform ϕ ^ ( ω ) = k = 1 H ( e j ω 2 k ) H ( 1 ) . The wavelet functions are computed using linear combination of scale functions ψ ( x ) = 2 H ( 1 ) k g k ϕ ( 2 x - k ) , where  g k = ( - 1 ) k + 1 h - k - 1 * , and {h k } are the coefficients of the LP filter in it Fourier series:

H ( ω ) = 2 H 0 ( ω ) = k h k e j k ω for  H 0 ( ω ) : h k = 2 2 π - π π H 0 ( ω ) e j k ω d ω ,

and wavelet ψ ̃ ( x ) = 2 H ̃ ( 1 ) k g ̃ k ϕ ̃ ( 2 x - k ) . The HP filter is represented by Fourier series with coefficients {h k }:

G ( ω ) = e j ω H * ( ω + π ) = k ( - 1 ) k + 1 h * - k - 1 e - j k ω .

The coefficients {h k } should satisfy such normalization condition: 1 2 k h k = H 0 ( 0 ) =1. Finally, wavelets of decomposition and reconstruction are employed in such a form: ψ ̃ i , k = 2 - i 2 ψ ̃ ( x 2 i - k ) and ψ i , k = 2 - i 2 ψ ( x 2 i - k ) , respectively, where i and k are indexes of translation and scale [16].

The procedure to synthesis the WAF consists of performing a scale function φ(x) that should generate the sequence of compact subspaces satisfying such property, each next subspace Vj+1is into a previous one V j : V j L2(X), j X; j V j = L2(X); j V j = {0}; f(x) V j f(2x) Vj+1. Finally, it should be existed such scale function φ(x) that: (a) with their shifts forms the Riesz bases; (b) it has symmetric and finite Fourier transform ϕ ̃ ( ω ) . Because the scale AF φ(x) and WAF ψ(x) are not compactly supported but they rapidly decrease (due to infinite differentiability), it is possible to select an effective support from such limit conditions: ||ϕ-ϕef||•100% ≤ 0.001%, ||ψ-ψef||•100% ≤ 0.001%. Filter coefficients h k for the scale function φ(x) generated from different WAFs: up, fup n , up n , π n can be found in [17]. In Table 1, we only present the coefficients h k for scale function φ(x) generated from AF up, fup 4 and π6 that exposes better perception quality in synthesized 3D images as one see below in simulation results. The effective supports for scale function φ(x) and wavelet ψ(x) generated from used AF are [-16, 16].

Table 1 Filter coefficients {h k } for scale function φ(x) generated from different WAF based on up, fup 4 , and π6.

The Wavelet technique, which the developed method uses, is based on the DWT. In proposed framework for DM estimation, the wavelets on each decomposition level are computed as follows [12]:

W s = W s Θ s ,
W s = D h , s 2 + D v , s 2 + D d , s 2 A s ,

where W s is the wavelet for a chosen decomposition level s; D h, s , D v, s , D d, s are the horizontal, vertical, and diagonal detail components at each a level s, A s is the approximation component, and θ s is the phase that is defined as follows:

θ s = ε s if D h , s > 0 π - ε s if D h , s < 0 , ε s = a r c t g D h , s D v , s .

Once the W s is computed for each an image stereo pair or neighboring frames for a video, the disparity map for each level of decomposition can be formed using the cross-correlation function in wavelet transform space:

C o r ( L _ R ) , s ( x , y ) = i , j P W L i , j W R x + i , y + j i , j P W L 2 i , j i , j P W R 2 x + i , y + j , (11)

where WL and WR are the wavelet transform for the left and right images in each decomposition level s, and P is sliding processing window. Finally, the disparity map for each level of decomposition is computed by applying the NNI technique. In this work, we propose using four levels of decomposition in DWT.

A block diagram of the proposed M-WAF framework is presented in Figure 2.

Figure 2
figure 2

The proposed M-WAF algorithm with four levels of decomposition.

2.2. Disparity map improvement and anaglyph synthesis

The classical methods used in anaglyph construction can produce ghosting effects and color loss. One way to reduce these artifacts in anaglyph synthesis is to use the dynamic range compression of the disparity map [18]. The dynamic range compression permits retaining the depth ordering information, which reduces the ghosting effects in the non-overlapping areas in the anaglyph. Therefore, the dynamic range reduction of the disparity map values can be employed to enhance the map quality. Using the P th law transformation for dynamic range compression [18], the original disparity map D is changed as follows:

D new = a D P ,

where Dnew is the new disparity map pixel value, 0 < a < 1 is a normalizing constant, and 0 < P < 1.

At the final stage, the anaglyph synthesis is performed using the improved disparity map. To generate an anaglyph, the neighboring frames in a grid dictated by the disparity map should be re-sampled. During numerous simulations, the bilinear, sinc and NNIs were implemented to find an anaglyph with a better 3D perception. The NNI showed a better performance during the simulations and it was sufficiently fast in comparison with the other investigated interpolations. Thus, the NNI was chosen to successfully create the required anaglyph in this application. The NNI is performed for each pair of neighboring frames in the video sequence. NNI [19] that uses this framework changes the values of the pixels to the closest neighbor value. To perform the NNI in the current decomposition level and to form the resulting disparity map, intensity of each pixel is changed. The new intensity value is determined by comparing a pixel in the low resolution disparity from i th decomposition level with the closest pixel value in the actual disparity map from (i - 1)th decomposition level.

2.3. DSP implementation

Our study also involved employing the promising 3D visualization algorithms in real-time modes using a DSP. The core of the EVM DM642™ is a digital media processor that is characterized by a large set of integrated features of the card, such as: a TMS320DM642™ DSP at 720 MHz (1.39 instructions per cycle or 570 million instructions per second), 32 Mb of SDRAM, 4 Mb of Linear Memory Flash, 2 video decoders, 1 video coder, FPGA™ implementation to display, double UART with RS-232 drivers, several input-output video formats and others. The communication between the code composer studio (CCS) and the EVM is achieved with an external emulator via JTAG connectors [20]. Using MATLAB's Simulink™ module, a project was created in which the DSP model and its respective task BIOS were selected. Then, a function is created to contain three sub functions: video capture, 3D video reconstruction using WAF, and the output interface to a video display. Next, a CCS™ project is conducted in Simulink™. During this step in the process, the MATLAB™ module sends a signal to the CCS and creates the project on C. To perform the video sequence processing using the DSP, the MATLAB™ program is first transformed into 'C' code for CCS via Simulink™. Once the CCS project has been created, the necessary changes are made to obtain the processing time values. The corresponding results for the designed and the reference frameworks are presented in the next section. Serial connection of three EVM DM642 is used in this application, where the first and second DSPs compute the disparity maps using M-WAF procedure, and the third DSP generates the anaglyph. The developed algorithm in Simulink™ is shown in Figure 3.

Figure 3
figure 3

Developed algorithm in Simulink™.

3. Simulation results

In the simulation experiments, various synthetic images are used to obtain the quantitative measurements. The synthetic images were obtained from Aloe, Venus, Lampshade1, Wood1, Bowling1, and Reindeer were the synthetic images used, all in PNG format (480 × 720 pixels). We also used the following test color video sequences in CIE format (250 frames, 288 × 352 pixels): Coastguard, Flowers, and Foreman. The test video sequences were obtained from In order to use the test color video sequences in the same sizes, we reformatted them in 480 × 720 pixels on Avi format. Additionally, the real life video sequences named Video Test1 (200 frames, 480 × 720 pixels) and Video Test2 (200 frames, 480 × 720 pixels) were recorded to apply the proposed algorithm in a common scenario. Video Test1 shows a truck moving in the scenery and Video Test2 shows three people walking toward the camera. Two quality objective criteria, quantity of bad disparities (QBD) [12] and similarity structure image measurement (SSIM) [21], were chosen as the quantitative metrics to justify the selection of the best disparity map algorithm in the 3D video sequence reconstruction. The QBD values have been calculated for different synthetic images as follows:

QBD = 1 N x , y d E x , y - d G x , y 2 ,

where N is the total number of pixels in the input image, and dE and dG are the estimated and the ground truth disparities, respectively.

The SSIM metric values are defined as follows:

SSIM x , y = l x , y c x , y s x , y ,

where the parameters l, c, and s are calculated according to following equations:

l x , y = 2 μ X x , y μ Y x , y + C 1 μ X 2 x , y + μ Y 2 x , y + C 1 ,
c x , y = 2 σ X x , y σ Y x , y + C 2 σ X 2 x , y + σ Y 2 x , y + C 2 ,
s x , y = σ X Y x , y + C 3 σ X x , y + σ Y x , y + C 3 .

In Equations (15) to (17), X is the estimated image, Y is the ground truth image, μ and σ are the mean value and standard deviation for the X or Y images, and C1 = C2 = C3 = 1.

Table 2 presents the values of QBD and SSIM for the proposed framework based on M-WAFs and the other techniques applied to different synthetic images.

Table 2 QBD and SSIM for proposed and existed algorithms for different test images.

The simulation results presented in Table 2 indicate that the best overall performance of disparity map reconstruction is produced by the M-WAF framework. The minimum value of QBP and the maximum value of SSIM are obtained when the M-WAF π6 is used, followed by WAF π6. At the final stage, when the anaglyphs were synthesized, the NCC was calculated in a sliding window with 5 × 5 pixels. The SSD algorithm was implemented in a window of size 9 × 9 pixels. The L&K algorithm was performed according to [9]. For all tested algorithms, the dynamic range compression was applied with the parameters a = P = 0.5. Figure 4 shows the obtained disparity map for all tested images and all implemented algorithms; evidently, the M-WAF π6 implementation produces the best overall visual results.

Figure 4
figure 4

Disparity map obtained using different algorithms for following test images. (a) Aloe, (b) Wood1, and (c) Bowling1.

Based on the objective quantity metrics and the subjective results presented in Figure 4, M-WAF π 6 has been selected as the technique to estimate the disparity map for video sequence visualization.

The anaglyphs, which were synthesized with the M-WAF algorithm, showed sufficiently good 3D visual perception with reduced ghosting and color loss. The spectacles with blue and red filters are required to observe Figures 5 and 6.

Figure 5
figure 5

Synthesized anaglyphs using M-WAF π 6 for the following test images. (a) Venus, (b) Aloe, (c) Bowling1, (d) Lampshade, (e) Reindeer, and (f) Wood1.

Figure 6
figure 6

Synthesized anaglyphs using M-WAF π 6 for frames of the following video sequences. (a) Flowers, (b) Coastguard, (c) Video Test1, and (d) Video Test2.

Processing time values were computed during the DSP implementation and the Table 3 shows the processing times for the video sequences using Matlab and the serial DSP implementation. Here, the tested video sequences were: Flowers, Coastguard, Video Test1, and Video Test2 (all with 480 × 720 pixels and with 240 × 360 pixels in RGB format).

Table 3 Processing times for different algorithms.

The processing time values were measured since the moment the sequence was acquired from the DSP until the anaglyph was displayed in a regular monitor.

The processing times in Table 3 lead to a possible conclusion that the DSP algorithm can process up to 20 frames per sec for a frame of 240 × 360 pixels size in RGB format. Additionally, the DSP algorithm can process up to 12 frames per sec for a frame of 480 × 720 pixels size in RGB format. Processing time values for L&K and SSD algorithms implemented in Matlab were 22.59 and 16.26 s, accordingly, because they required extremely computationally intensive operations.

4. Conclusion

This study analyzed the performance of various 3D reconstruction methods. The proposed framework based on M-WAFs is the most effective method to reconstruct the disparity map for 3D video sequences with different types of movements. Such framework produces the best depth and the best spatial perception in synthesized 3D video sequences against other analyzed algorithms that is confirmed by numerous simulations for different initial 2D color video sequences. The M-WAF algorithm can be applied to any type of color video sequence without additional information. The performance of the DSP implementation shows that the proposed algorithm can practically visualize the final 3D color video sequence in real-time mode. In future, we suppose to optimize the proposed algorithm in order to increase the processing speed up to the film velocity.



code composer studio




low pass


multiple decomposition levels


normalized cross-correlation


quantity of bad disparities


region-based stereo matching


sum of squared difference: WAF: wavelet atomic functions


wavelet transform modulus.


  1. Smolic A, Kauff P, Hnorr S, Hournung A, Kunter M, Muller M, Lang M: Three dimensional video postproduction and processing. Proc IEEE 2011,99(4):607-625.

    Article  Google Scholar 

  2. Ideses I, Yaroslavsky L: New methods to produce high quality color anaglyphs for 3D visualization. In ICIAR, Lecture Notes in Computer Science. Volume 3212. Springer Verlag, Germany; 2004:273-280. 10.1007/978-3-540-30126-4_34

    Google Scholar 

  3. Sanders W, McAllister D: Producing anaglyphs from synthetic images. Proceedings of SPIE Stereoscopic Displays and Virtual Reality Systems X 2003, 5006: 348-358.

    Article  Google Scholar 

  4. Dubois E: A projection method to generate anaglyph stereo images. In Proceedings of IEEE International Conference on Acoustic Speech Signal Processing. Volume 3. Salt Lake City, USA; 2001:1661-1664.

    Google Scholar 

  5. Woods A, Rouke T: Ghosting in anaglyphic stereoscopic images. Stereoscopic Displays and Applications XV, Proceedings of SPIE-IS&T Electronic Imaging, SPIE 2004, 5291: 354-365.

    Google Scholar 

  6. Ideses I, Yaroslavsky L, Fishbain B: 3D from compressed video, in Stereoscopic displays and virtual reality systems. Proc SPIE 2007.,6490(64901C):

  7. Caviedes J, Villegas J: Real time 2D to 3D conversion: Technical and visual quality requirements. International Conference on Consumer Electronics, ICCE-IEEE 2011, 897-898.

    Google Scholar 

  8. Fleet DJ: Measurement of Image Velocity. Kluwer Academic Publishers, Massachusetts; 1992.

    Book  MATH  Google Scholar 

  9. Beauchemin SS, Barron JL: The computation of optical flow. ACM Comput Surv 1995,27(3):433-465. 10.1145/212094.212141

    Article  Google Scholar 

  10. Bovik A: Handbook of Image and Video Processing. Academic Press, USA; 2000.

    MATH  Google Scholar 

  11. Alagoz BB: Obtaining depth maps from color images by region based stereo matching algorithms. OncuBilim Algor Syst Labs 2008,08(4):1-12.

    Google Scholar 

  12. Bhatti A, Nahavandi S: Stereo Vision. Volume Chap 6. I-Tech, Vienna; 2008:27-48.

    Book  MATH  Google Scholar 

  13. YuV Gulyaev, Kravchenko VF, Pustovoit VI: A new class of WA-systems of Kravchenko-Rvachev functions in Doklady mathematics. 2007,75(2):325-332.

    Google Scholar 

  14. Juarez C, Ponomaryov V, Sanchez J, Kravchenko V: Wavelets based on atomic function used in detection and classification of masses in mammography. Lecture Notes in Artificial Intelligence 2008, 5317: 295-304.

    Google Scholar 

  15. Kravchenko V, Meana H, Ponomaryov V:Adaptive Digital Processing of Multidimensional Signals with Applications. FizMatLit Edit, Moscow; 2009. []

    Google Scholar 

  16. Meyer Y: Ondelettes. Hermann, Paris; 1991.

    MATH  Google Scholar 

  17. Kravchenko VF, Yurin AV: New class of wavelet functions in digital processing of signals and images. J Success Mod Radio Electron, Moscow, Edit Radioteknika 2008, 5: 3-123.

    Google Scholar 

  18. Ideses I, Yaroslavsky L: Three methods that improve the visual quality of colour anaglyphs. J Opt A Pure Appl Opt 2005, 7: 755-762. 10.1088/1464-4258/7/12/008

    Article  Google Scholar 

  19. Goshtasby A: 2D and 3D Image Registration. Wiley Publishers, USA; 2005.

    Google Scholar 

  20. Texas Instruments: TMS320DM642 Evaluation Module with TVP Video Encoders. Technical Reference 507345-0001 Rev B 2004.

    Google Scholar 

  21. Malpica WS, Bovik AC: Range image quality assessment by structural similarity. ICASSP 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE 2009, 1149-1152.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Eduardo Ramos-Diaz.

Additional information

Competing interests

The authors thank the National Polytechnic Institute of Mexico and CONACY (Project 81599) for their support of this work

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Ramos-Diaz, E., Kravchenko, V. & Ponomaryov, V. Efficient 2D to 3D video conversion implemented on DSP. EURASIP J. Adv. Signal Process. 2011, 106 (2011).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: