Skip to content

Advertisement

  • Research Article
  • Open Access

3D-Audio Matting, Postediting, and Rerendering from Field Recordings

EURASIP Journal on Advances in Signal Processing20072007:047970

https://doi.org/10.1155/2007/47970

  • Received: 1 May 2006
  • Accepted: 24 November 2006
  • Published:

Abstract

We present a novel approach to real-time spatial rendering of realistic auditory environments and sound sources recorded live, in the field. Using a set of standard microphones distributed throughout a real-world environment, we record the sound field simultaneously from several locations. After spatial calibration, we segment from this set of recordings a number of auditory components, together with their location. We compare existing time delay of arrival estimation techniques between pairs of widely spaced microphones and introduce a novel efficient hierarchical localization algorithm. Using the high-level representation thus obtained, we can edit and rerender the acquired auditory scene over a variety of listening setups. In particular, we can move or alter the different sound sources and arbitrarily choose the listening position. We can also composite elements of different scenes together in a spatially consistent way. Our approach provides efficient rendering of complex soundscapes which would be challenging to model using discrete point sources and traditional virtual acoustics techniques. We demonstrate a wide range of possible applications for games, virtual and augmented reality, and audio visual post production.

Keywords

  • Augmented Reality
  • Sound Source
  • Localization Algorithm
  • Sound Field
  • Field Recording

[1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374]

Authors’ Affiliations

(1)
Rendu & Environnements Virtuel Sonorisés, Institut National de Recherche en Informatique et en Automatique, Sophia-Antipolis, Cedex, 06902, France
(2)
Centre Scientifique et Technique du Bâtiment, Sophia-Antipolis, Cedex, 06904, France

References

  1. Malham DG, Myatt A: 3-D sound spatialization using ambisonic techniques. Computer Music Journal 1995,19(4):58-70. 10.2307/3680991View ArticleGoogle Scholar
  2. Soundfield http://www.soundfield.com/
  3. Aliaga DG, Carlbom I: Plenoptic stitching: a scalable method for reconstructing 3D interactive walkthroughs. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 443-450.View ArticleGoogle Scholar
  4. Buehler C, Bosse M, McMillan L, Gortler S, Cohen M: Unstructured lumigraph rendering. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 425-432.View ArticleGoogle Scholar
  5. Chen SE, Williams L: View interpolation for image synthesis. Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '93), August 1993, Anaheim, Calif, USA 279-288.View ArticleGoogle Scholar
  6. Horry Y, Anjyo K-I, Arai K: Tour into the picture: using a spidery mesh interface to make animation from a single image. Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '97), August 1997, Los Angeles, Calif, USA 225-232.View ArticleGoogle Scholar
  7. Porter T, Duff T: Compositing digital images. Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '84), July 1984, Minneapolis, Minn, USA 253-259.Google Scholar
  8. Yewdall DL: Practical Art of Motion Picture Sound. 2nd edition. Focal Press, Boston, Mass, USA; 2003.Google Scholar
  9. Streicher R: The decca tree - it's not just for stereo anymore. http://www.wesdooley.com/pdf/Surround_Sound_Decca_Tree-urtext.pdf
  10. Streicher R, Everest FA (Eds): The New Stereo Soundbook. 2nd edition. Audio Engineering Associate, Pasadena, Calif, USA; 1998.Google Scholar
  11. Daniel J, Rault J-B, Polack J-D: Ambisonics encoding of other audio formats for multiple listening conditions. Proceedings of the 105th Convention of the Audio Engineering Society, September 1998, San Francisco, Calif, USA preprint 4795Google Scholar
  12. Gerzon MA: Ambisonics in multichannel broadcasting and video. Journal of the Audio Engineering Society 1985,33(11):859-871.Google Scholar
  13. Leese MJ: Ambisonic surround sound FAQ (version 2.8). 1998.http://members.tripod.com/martin_leese/Ambisonic/Google Scholar
  14. Merimaa J: Applications of a 3-D microphone array. 112th AES Convention, May 2002, Munich, Germany preprint 5501Google Scholar
  15. Laborie A, Bruno R, Montoya S: A new comprehensive approach of surround sound recording. Proceedings of the 114th Convention of the Audio Engineering Society, March 2003, Amsterdam, The Netherlands preprint 5717Google Scholar
  16. Jot J-M, Larcher V, Pernaux J-M: A comparative study of 3D audio encoding and rendering techniques. Proceedings of the AES 16th International Conference on Spatial Sound Reproduction, April 1999, Rovaniemi, FinlandGoogle Scholar
  17. Abhayapala TD, Ward DB: Theory and design of high order sound field microphones using spherical microphone array. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1949-1952.Google Scholar
  18. Laborie A, Bruno R, Montoya S: High spatial resolution multi-channel recording. Proceedings of the 116th Convention of the Audio Engineering Society, May 2004, Berlin, Germany preprint 6116Google Scholar
  19. Meyer J, Elko G: Spherical microphone arrays for 3D sound recording. In Audio Signal Processing for Next-Generation Multimedia Communication Systems. Edited by: (Arden) Huang Y, Benesty J. Kluwer Academic, Boston, Mass, USA; 2004. chapter 2Google Scholar
  20. Berkhout AJ, de Vries D, Vogel P: Acoustic control by wave field synthesis. Journal of the Acoustical Society of America 1993,93(5):2764-2778. 10.1121/1.405852View ArticleGoogle Scholar
  21. Boone MM, Verheijen ENG, van Tol PF: Spatial sound-field reproduction by wave-field synthesis. Journal of the Audio Engineering Society 1995,43(12):1003-1012.Google Scholar
  22. Ajdler T, Vetterli M: The plenacoustic function and its sampling. Proceedings of the 1st IEEE Benelux Workshop on Model Based Processing and Coding of Audio (MPCA '02), November 2002, Leuven, BelgiumGoogle Scholar
  23. Do MN: Toward sound-based synthesis: the far-field case. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 2: 601-604.Google Scholar
  24. Gortler SJ, Grzeszczuk R, Szeliski R, Cohen MF: The lumigraph. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 43-54.View ArticleGoogle Scholar
  25. Levoy M, Hanrahan P: Light field rendering. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 31-42.View ArticleGoogle Scholar
  26. Horbach U, Karamustafaoglu A, Pellegrini R, Mackensen P, Theile G: Design and applications of a data-based auralization system for surround sound. Proceedings of the 106th Convention of the Audio Engineering Society, May 1999, Munich, Germany preprint 4976Google Scholar
  27. Pellegrini RS: Comparison of data and model-based simulation algorithms for auditory virtual environments. 106th Convention of the Audio Engineering Society, May 1999, Munich, Germany preprint 4953Google Scholar
  28. Bregman AS: Auditory Scene Analysis, The Perceptual Organization of Sound. MIT Press, Cambridge, Mass, USA; 1990.Google Scholar
  29. Baumgarte F, Faller C: Binaural cue coding—part I: psychoacoustic fundamentals and design principles. IEEE Transactions on Speech and Audio Processing 2003,11(6):509-519. 10.1109/TSA.2003.818109View ArticleGoogle Scholar
  30. Faller C, Baumgarte F: Binaural cue coding—part II: schemes and applications. IEEE Transactions on Speech and Audio Processing 2003,11(6):520-531. 10.1109/TSA.2003.818108View ArticleGoogle Scholar
  31. Merimaa J, Pulkki V: Spatial impulse response rendering. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 139-144.Google Scholar
  32. Pulkki V: Directional audio coding in spatial sound reproduction and stereo upmixing. Proceedings of the 28th AES International Conference, June 2006, Pitea, SwedenGoogle Scholar
  33. O'Grady PD, Pearlmutter BA, Rickard ST: Survey of sparse and non-sparse methods in source separation. International Journal of Imaging Systems and Technology 2005,15(1):18-33. 10.1002/ima.20035View ArticleGoogle Scholar
  34. Vincent E, Rodet X, Röbel A, et al.: A tentative typology of audio source separation tasks. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan 715-720.Google Scholar
  35. Rickard S: Sparse sources are separated sources. Proceedings of the 14th Annual European Signal Processing Conference, September 2006, Florence, ItalyGoogle Scholar
  36. Lewicki MS: Efficient coding of natural sounds. Nature Neuroscience 2002,5(4):356-363. 10.1038/nn831View ArticleGoogle Scholar
  37. Comon P: Independent component analysis. A new concept? Signal Processing 1994,36(3):287-314. 10.1016/0165-1684(94)90029-9View ArticleMATHGoogle Scholar
  38. Sawada H, Araki S, Mukai R, Makino S: Blind extraction of dominant target sources using ICA and time-frequency masking. IEEE Transactions on Audio, Speech and Language Processing 2006,14(6):2165-2173.View ArticleGoogle Scholar
  39. Jourjine A, Rickard S, Yilmaz O: Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 5: 2985-2988.Google Scholar
  40. Yilmaz Ö, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004,52(7):1830-1847. 10.1109/TSP.2004.828896MathSciNetView ArticleGoogle Scholar
  41. Avendano C: Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '03), October 2003, New Paltz, NY, USA 55-58.Google Scholar
  42. Radke R, Rickard S: Audio interpolation. Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio (AES22 '02), June 2002, Espoo, Finland 51-57.Google Scholar
  43. Moses RL, Krishnamurthy D, Patterson R: An auto-calibration method for unattended ground sensors. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 3: 2941-2944.Google Scholar
  44. Faugeras O: Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge, Mass, USA; 1993.Google Scholar
  45. Moore BCJ: An Introduction to the Psychology of Hearing. 4th edition. Academic Press, New York, NY, USA; 1997.Google Scholar
  46. Aarabi P: The fusion of distributed microphone arrays for sound localization. EURASIP Journal on Applied Signal Processing 2003,2003(4):338-347. 10.1155/S1110865703212014View ArticleMATHGoogle Scholar
  47. (Arden) Huang Y, Benesty J, Elko GW: Microphone arrays for video camera steering. In Acoustic Signal Processing for Telecommunication. Kluwer Academic, Boston, Mass, USA; 2000:239-259. chapter 11View ArticleGoogle Scholar
  48. Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976,24(4):320-327. 10.1109/TASSP.1976.1162830View ArticleGoogle Scholar
  49. Krim H, Viberg M: Two decades of array signal processing research: the parametric approach. IEEE Signal Processing Magazine 1996,13(4):67-94. 10.1109/79.526899View ArticleGoogle Scholar
  50. Schmidt RO: Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation 1986,34(3):276-280. 10.1109/TAP.1986.1143830View ArticleGoogle Scholar
  51. Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359-370. 10.1155/S1110865703212038View ArticleMATHGoogle Scholar
  52. DiBiase JH, Silverman HF, Branstein MS: Microphone Arrays, Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 8Google Scholar
  53. Mungamuru B, Aarabi P: Enhanced sound localization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 2004,34(3):1526-1540. 10.1109/TSMCB.2004.826398View ArticleGoogle Scholar
  54. Chen J, Benesty J, (Arden) Huang Y: Time delay estimation in room acoustic environments: an overview. EURASIP Journal on Applied Signal Processing 2006, 2006: 19 pages.MATHGoogle Scholar
  55. Rabinkin DV, Renomeron RJ, French JC, Flanagan JL: Estimation of wavefront arrival delay using the cross-power spectrum phase technique. 132nd Meeting of the Acoustical Society of America, December 1996, Honolulu, Hawaii, USAGoogle Scholar
  56. Chen J, Benesty J, (Arden) Huang Y: Performance of GCC- and AMDF-based time-delay estimation in practical reverberant environments. EURASIP Journal on Applied Signal Processing 2005,2005(1):25-36. 10.1155/ASP.2005.25View ArticleMATHGoogle Scholar
  57. Rui Y, Florencio D: New direct approaches to robust sound source localization. Proceedings of International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 737-740.Google Scholar
  58. Ajdler T, Kozintsev I, Lienhart R, Vetterli M: Acoustic source localization in distributed sensor networks. Proceedings of the 38th Asilomar Conference on Signals, Systems and Computers, November 2004, Pacific Grove, Calif, USA 2: 1328-1332.Google Scholar
  59. Samet H: The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass, USA; 1990.Google Scholar
  60. Tsingos N, Gallo E, Drettakis G: Perceptual audio rendering of complex virtual environments. ACM Transactions on Graphics 2004,23(3):249-258. Proceedings of SIGGRAPH 2004 10.1145/1015706.1015710View ArticleGoogle Scholar
  61. Kalman RE: A new approach to linear filtering and prediction problems. Transactions of the ASME - Journal of Basic Engineering 1960, 82: 35-45. 10.1115/1.3662552View ArticleGoogle Scholar
  62. Malham DG: Spherical harmonic coding of sound objects - the ambisonic 'O' format. Proceedings of the 19th AES International Conference, Surround Sound—Techniques, Technology, and Perception, June 2001, Schloss Elmau, Germany 54-57.Google Scholar
  63. Tsingos N, Gascuel J-D: Fast rendering of sound occlusion and diffraction effects for virtual acoustic environments. Proceedings of the 104th Audio Engineering Society Convention, May 1998, Amsterdam, The Netherlands preprint 4699Google Scholar
  64. Baskind A, Warusfel O: Methods for blind computational estimation of perceptual attributes of room acoustics. Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio, June 2002, Espoo, Finland 402-411.Google Scholar
  65. Rickard S, Yilmaz O: On the approximate W-disjoint orthogonality of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529-532.Google Scholar
  66. Lewicki MS, Sejnowski TJ: Learning overcomplete representations. Neural Computation 2000,12(2):337-365. 10.1162/089976600300015826View ArticleGoogle Scholar
  67. Mallat SG, Zhang Z: Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 1993,41(12):3397-3415. 10.1109/78.258082View ArticleMATHGoogle Scholar
  68. Slaney M, Covell M, Lassiter B: Automatic audio morphing. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 1001-1004.Google Scholar
  69. Faller C, Merimaa J: Source localization in complex listening situations: selection of binaural cues based on interaural coherence. Journal of the Acoustical Society of America 2004,116(5):3075-3089. 10.1121/1.1791872View ArticleGoogle Scholar
  70. Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453View ArticleGoogle Scholar
  71. Huang G, Yang L, He Z: Multiple acoustic sources location based on blind source separation. Proceedings of the 1st International Conference on Natural Computation (ICNC '05), August 2005, Changsha, China 683-687.Google Scholar
  72. Saruwatari H, Kurita S, Takeda K, Itakura F, Nishikawa T, Shikano K: Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Applied Signal Processing 2003,2003(11):1135-1146. 10.1155/S1110865703305104View ArticleMATHGoogle Scholar
  73. Wilson KW, Darell T: Learning a precedence effect-like weighting function for the generalized cross-correlation framework. IEEE Transactions on Audio, Speech and Language Processing 2006,14(6):2156-2164.View ArticleGoogle Scholar
  74. Lu L, Wenyin L, Zhang H-J: Audio textures: theory and applications. IEEE Transactions on Speech and Audio Processing 2004,12(2):156-167. 10.1109/TSA.2003.819947View ArticleGoogle Scholar

Copyright

© Emmanuel Gallo et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement