3D-Audio Matting, Postediting, and Rerendering from Field Recordings

Gallo, Emmanuel; Tsingos, Nicolas; Lemaitre, Guillaume

doi:10.1155/2007/47970

Research Article
Open access
Published: 01 December 2007

3D-Audio Matting, Postediting, and Rerendering from Field Recordings

Emmanuel Gallo^1,2,
Nicolas Tsingos¹ &
Guillaume Lemaitre¹

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 047970 (2007) Cite this article

1750 Accesses
7 Citations
Metrics details

Abstract

We present a novel approach to real-time spatial rendering of realistic auditory environments and sound sources recorded live, in the field. Using a set of standard microphones distributed throughout a real-world environment, we record the sound field simultaneously from several locations. After spatial calibration, we segment from this set of recordings a number of auditory components, together with their location. We compare existing time delay of arrival estimation techniques between pairs of widely spaced microphones and introduce a novel efficient hierarchical localization algorithm. Using the high-level representation thus obtained, we can edit and rerender the acquired auditory scene over a variety of listening setups. In particular, we can move or alter the different sound sources and arbitrarily choose the listening position. We can also composite elements of different scenes together in a spatially consistent way. Our approach provides efficient rendering of complex soundscapes which would be challenging to model using discrete point sources and traditional virtual acoustics techniques. We demonstrate a wide range of possible applications for games, virtual and augmented reality, and audio visual post production.

References

Malham DG, Myatt A: 3-D sound spatialization using ambisonic techniques. Computer Music Journal 1995,19(4):58-70. 10.2307/3680991
Article Google Scholar
Soundfield https://doi.org/www.soundfield.com/
Aliaga DG, Carlbom I: Plenoptic stitching: a scalable method for reconstructing 3D interactive walkthroughs. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 443–450.
Chapter Google Scholar
Buehler C, Bosse M, McMillan L, Gortler S, Cohen M: Unstructured lumigraph rendering. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 425–432.
Chapter Google Scholar
Chen SE, Williams L: View interpolation for image synthesis. Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '93), August 1993, Anaheim, Calif, USA 279–288.
Chapter Google Scholar
Horry Y, Anjyo K-I, Arai K: Tour into the picture: using a spidery mesh interface to make animation from a single image. Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '97), August 1997, Los Angeles, Calif, USA 225–232.
Chapter Google Scholar
Porter T, Duff T: Compositing digital images. Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '84), July 1984, Minneapolis, Minn, USA 253–259.
Chapter Google Scholar
Yewdall DL: Practical Art of Motion Picture Sound. 2nd edition. Focal Press, Boston, Mass, USA; 2003.
Google Scholar
Streicher R: The decca tree - it's not just for stereo anymore. https://doi.org/www.wesdooley.com/pdf/Surround_Sound_Decca_Tree-urtext.pdf
Streicher R, Everest FA (Eds): The New Stereo Soundbook. 2nd edition. Audio Engineering Associate, Pasadena, Calif, USA; 1998.
Google Scholar
Daniel J, Rault J-B, Polack J-D: Ambisonics encoding of other audio formats for multiple listening conditions. Proceedings of the 105th Convention of the Audio Engineering Society, September 1998, San Francisco, Calif, USA preprint 4795
Google Scholar
Gerzon MA: Ambisonics in multichannel broadcasting and video. Journal of the Audio Engineering Society 1985,33(11):859-871.
Google Scholar
Leese MJ: Ambisonic surround sound FAQ (version 2.8). 1998.https://doi.org/members.tripod.com/martin_leese/Ambisonic/
Google Scholar
Merimaa J: Applications of a 3-D microphone array. 112th AES Convention, May 2002, Munich, Germany preprint 5501
Google Scholar
Laborie A, Bruno R, Montoya S: A new comprehensive approach of surround sound recording. Proceedings of the 114th Convention of the Audio Engineering Society, March 2003, Amsterdam, The Netherlands preprint 5717
Google Scholar
Jot J-M, Larcher V, Pernaux J-M: A comparative study of 3D audio encoding and rendering techniques. Proceedings of the AES 16th International Conference on Spatial Sound Reproduction, April 1999, Rovaniemi, Finland
Google Scholar
Abhayapala TD, Ward DB: Theory and design of high order sound field microphones using spherical microphone array. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1949–1952.
Google Scholar
Laborie A, Bruno R, Montoya S: High spatial resolution multi-channel recording. Proceedings of the 116th Convention of the Audio Engineering Society, May 2004, Berlin, Germany preprint 6116
Google Scholar
Meyer J, Elko G: Spherical microphone arrays for 3D sound recording. In Audio Signal Processing for Next-Generation Multimedia Communication Systems. Edited by: (Arden) Huang Y, Benesty J. Kluwer Academic, Boston, Mass, USA; 2004. chapter 2
Google Scholar
Berkhout AJ, de Vries D, Vogel P: Acoustic control by wave field synthesis. Journal of the Acoustical Society of America 1993,93(5):2764-2778. 10.1121/1.405852
Article Google Scholar
Boone MM, Verheijen ENG, van Tol PF: Spatial sound-field reproduction by wave-field synthesis. Journal of the Audio Engineering Society 1995,43(12):1003-1012.
Google Scholar
Ajdler T, Vetterli M: The plenacoustic function and its sampling. Proceedings of the 1st IEEE Benelux Workshop on Model Based Processing and Coding of Audio (MPCA '02), November 2002, Leuven, Belgium
Google Scholar
Do MN: Toward sound-based synthesis: the far-field case. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 2: 601–604.
Google Scholar
Gortler SJ, Grzeszczuk R, Szeliski R, Cohen MF: The lumigraph. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 43–54.
Chapter Google Scholar
Levoy M, Hanrahan P: Light field rendering. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 31–42.
Chapter Google Scholar
Horbach U, Karamustafaoglu A, Pellegrini R, Mackensen P, Theile G: Design and applications of a data-based auralization system for surround sound. Proceedings of the 106th Convention of the Audio Engineering Society, May 1999, Munich, Germany preprint 4976
Google Scholar
Pellegrini RS: Comparison of data and model-based simulation algorithms for auditory virtual environments. 106th Convention of the Audio Engineering Society, May 1999, Munich, Germany preprint 4953
Google Scholar
Bregman AS: Auditory Scene Analysis, The Perceptual Organization of Sound. MIT Press, Cambridge, Mass, USA; 1990.
Book Google Scholar
Baumgarte F, Faller C: Binaural cue coding—part I: psychoacoustic fundamentals and design principles. IEEE Transactions on Speech and Audio Processing 2003,11(6):509-519. 10.1109/TSA.2003.818109
Article Google Scholar
Faller C, Baumgarte F: Binaural cue coding—part II: schemes and applications. IEEE Transactions on Speech and Audio Processing 2003,11(6):520-531. 10.1109/TSA.2003.818108
Article Google Scholar
Merimaa J, Pulkki V: Spatial impulse response rendering. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 139–144.
Google Scholar
Pulkki V: Directional audio coding in spatial sound reproduction and stereo upmixing. Proceedings of the 28th AES International Conference, June 2006, Pitea, Sweden
Google Scholar
O'Grady PD, Pearlmutter BA, Rickard ST: Survey of sparse and non-sparse methods in source separation. International Journal of Imaging Systems and Technology 2005,15(1):18-33. 10.1002/ima.20035
Article Google Scholar
Vincent E, Rodet X, Röbel A, et al.: A tentative typology of audio source separation tasks. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan 715–720.
Google Scholar
Rickard S: Sparse sources are separated sources. Proceedings of the 14th Annual European Signal Processing Conference, September 2006, Florence, Italy
Google Scholar
Lewicki MS: Efficient coding of natural sounds. Nature Neuroscience 2002,5(4):356-363. 10.1038/nn831
Article Google Scholar
Comon P: Independent component analysis. A new concept? Signal Processing 1994,36(3):287-314. 10.1016/0165-1684(94)90029-9
Article MATH Google Scholar
Sawada H, Araki S, Mukai R, Makino S: Blind extraction of dominant target sources using ICA and time-frequency masking. IEEE Transactions on Audio, Speech and Language Processing 2006,14(6):2165-2173.
Article Google Scholar
Jourjine A, Rickard S, Yilmaz O: Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 5: 2985–2988.
Google Scholar
Yilmaz Ö, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004,52(7):1830-1847. 10.1109/TSP.2004.828896
Article MathSciNet MATH Google Scholar
Avendano C: Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '03), October 2003, New Paltz, NY, USA 55–58.
Google Scholar
Radke R, Rickard S: Audio interpolation. Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio (AES22 '02), June 2002, Espoo, Finland 51–57.
Google Scholar
Moses RL, Krishnamurthy D, Patterson R: An auto-calibration method for unattended ground sensors. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 3: 2941–2944.
Google Scholar
Faugeras O: Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge, Mass, USA; 1993.
Google Scholar
Moore BCJ: An Introduction to the Psychology of Hearing. 4th edition. Academic Press, New York, NY, USA; 1997.
Google Scholar
Aarabi P: The fusion of distributed microphone arrays for sound localization. EURASIP Journal on Applied Signal Processing 2003,2003(4):338-347. 10.1155/S1110865703212014
MATH Google Scholar
(Arden) Huang Y, Benesty J, Elko GW: Microphone arrays for video camera steering. In Acoustic Signal Processing for Telecommunication. Kluwer Academic, Boston, Mass, USA; 2000:239-259. chapter 11
Chapter Google Scholar
Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976,24(4):320-327. 10.1109/TASSP.1976.1162830
Article Google Scholar
Krim H, Viberg M: Two decades of array signal processing research: the parametric approach. IEEE Signal Processing Magazine 1996,13(4):67-94. 10.1109/79.526899
Article Google Scholar
Schmidt RO: Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation 1986,34(3):276-280. 10.1109/TAP.1986.1143830
Article Google Scholar
Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359-370. 10.1155/S1110865703212038
MATH Google Scholar
DiBiase JH, Silverman HF, Branstein MS: Microphone Arrays, Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 8
Google Scholar
Mungamuru B, Aarabi P: Enhanced sound localization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 2004,34(3):1526-1540. 10.1109/TSMCB.2004.826398
Article Google Scholar
Chen J, Benesty J, (Arden) Huang Y: Time delay estimation in room acoustic environments: an overview. EURASIP Journal on Applied Signal Processing 2006, 2006: 19 pages.
MATH Google Scholar
Rabinkin DV, Renomeron RJ, French JC, Flanagan JL: Estimation of wavefront arrival delay using the cross-power spectrum phase technique. 132nd Meeting of the Acoustical Society of America, December 1996, Honolulu, Hawaii, USA
Google Scholar
Chen J, Benesty J, (Arden) Huang Y: Performance of GCC- and AMDF-based time-delay estimation in practical reverberant environments. EURASIP Journal on Applied Signal Processing 2005,2005(1):25-36. 10.1155/ASP.2005.25
MATH Google Scholar
Rui Y, Florencio D: New direct approaches to robust sound source localization. Proceedings of International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 737–740.
Google Scholar
Ajdler T, Kozintsev I, Lienhart R, Vetterli M: Acoustic source localization in distributed sensor networks. Proceedings of the 38th Asilomar Conference on Signals, Systems and Computers, November 2004, Pacific Grove, Calif, USA 2: 1328–1332.
Google Scholar
Samet H: The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass, USA; 1990.
Google Scholar
Tsingos N, Gallo E, Drettakis G: Perceptual audio rendering of complex virtual environments. ACM Transactions on Graphics 2004,23(3):249-258. Proceedings of SIGGRAPH 2004 10.1145/1015706.1015710
Article Google Scholar
Kalman RE: A new approach to linear filtering and prediction problems. Transactions of the ASME - Journal of Basic Engineering 1960, 82: 35–45. 10.1115/1.3662552
Article MathSciNet Google Scholar
Malham DG: Spherical harmonic coding of sound objects - the ambisonic 'O' format. Proceedings of the 19th AES International Conference, Surround Sound—Techniques, Technology, and Perception, June 2001, Schloss Elmau, Germany 54–57.
Google Scholar
Tsingos N, Gascuel J-D: Fast rendering of sound occlusion and diffraction effects for virtual acoustic environments. Proceedings of the 104th Audio Engineering Society Convention, May 1998, Amsterdam, The Netherlands preprint 4699
Google Scholar
Baskind A, Warusfel O: Methods for blind computational estimation of perceptual attributes of room acoustics. Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio, June 2002, Espoo, Finland 402–411.
Google Scholar
Rickard S, Yilmaz O: On the approximate W-disjoint orthogonality of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529–532.
Google Scholar
Lewicki MS, Sejnowski TJ: Learning overcomplete representations. Neural Computation 2000,12(2):337-365. 10.1162/089976600300015826
Article Google Scholar
Mallat SG, Zhang Z: Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 1993,41(12):3397-3415. 10.1109/78.258082
Article MATH Google Scholar
Slaney M, Covell M, Lassiter B: Automatic audio morphing. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 1001–1004.
Google Scholar
Faller C, Merimaa J: Source localization in complex listening situations: selection of binaural cues based on interaural coherence. Journal of the Acoustical Society of America 2004,116(5):3075-3089. 10.1121/1.1791872
Article Google Scholar
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453
Article Google Scholar
Huang G, Yang L, He Z: Multiple acoustic sources location based on blind source separation. Proceedings of the 1st International Conference on Natural Computation (ICNC '05), August 2005, Changsha, China 683–687.
Google Scholar
Saruwatari H, Kurita S, Takeda K, Itakura F, Nishikawa T, Shikano K: Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Applied Signal Processing 2003,2003(11):1135–1146. 10.1155/S1110865703305104
MATH Google Scholar
Wilson KW, Darell T: Learning a precedence effect-like weighting function for the generalized cross-correlation framework. IEEE Transactions on Audio, Speech and Language Processing 2006,14(6):2156-2164.
Article Google Scholar
Lu L, Wenyin L, Zhang H-J: Audio textures: theory and applications. IEEE Transactions on Speech and Audio Processing 2004,12(2):156-167. 10.1109/TSA.2003.819947
Article Google Scholar

Download references

Author information

Authors and Affiliations

Rendu & Environnements Virtuel Sonorisés, Institut National de Recherche en Informatique et en Automatique, Sophia-Antipolis, Cedex, 06902, France
Emmanuel Gallo, Nicolas Tsingos & Guillaume Lemaitre
Centre Scientifique et Technique du Bâtiment, Sophia-Antipolis, Cedex, 06904, France
Emmanuel Gallo

Authors

Emmanuel Gallo
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Tsingos
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Lemaitre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emmanuel Gallo.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Gallo, E., Tsingos, N. & Lemaitre, G. 3D-Audio Matting, Postediting, and Rerendering from Field Recordings. EURASIP J. Adv. Signal Process. 2007, 047970 (2007). https://doi.org/10.1155/2007/47970

Download citation

Received: 01 May 2006
Revised: 11 September 2006
Accepted: 24 November 2006
Published: 01 December 2007
DOI: https://doi.org/10.1155/2007/47970

3D-Audio Matting, Postediting, and Rerendering from Field Recordings

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords