Skip to main content

3D-Audio Matting, Postediting, and Rerendering from Field Recordings

Abstract

We present a novel approach to real-time spatial rendering of realistic auditory environments and sound sources recorded live, in the field. Using a set of standard microphones distributed throughout a real-world environment, we record the sound field simultaneously from several locations. After spatial calibration, we segment from this set of recordings a number of auditory components, together with their location. We compare existing time delay of arrival estimation techniques between pairs of widely spaced microphones and introduce a novel efficient hierarchical localization algorithm. Using the high-level representation thus obtained, we can edit and rerender the acquired auditory scene over a variety of listening setups. In particular, we can move or alter the different sound sources and arbitrarily choose the listening position. We can also composite elements of different scenes together in a spatially consistent way. Our approach provides efficient rendering of complex soundscapes which would be challenging to model using discrete point sources and traditional virtual acoustics techniques. We demonstrate a wide range of possible applications for games, virtual and augmented reality, and audio visual post production.

References

  1. 1.

    Malham DG, Myatt A: 3-D sound spatialization using ambisonic techniques. Computer Music Journal 1995,19(4):58-70. 10.2307/3680991

    Article  Google Scholar 

  2. 2.

    Soundfield https://doi.org/www.soundfield.com/

  3. 3.

    Aliaga DG, Carlbom I: Plenoptic stitching: a scalable method for reconstructing 3D interactive walkthroughs. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 443–450.

    Google Scholar 

  4. 4.

    Buehler C, Bosse M, McMillan L, Gortler S, Cohen M: Unstructured lumigraph rendering. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 425–432.

    Google Scholar 

  5. 5.

    Chen SE, Williams L: View interpolation for image synthesis. Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '93), August 1993, Anaheim, Calif, USA 279–288.

    Google Scholar 

  6. 6.

    Horry Y, Anjyo K-I, Arai K: Tour into the picture: using a spidery mesh interface to make animation from a single image. Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '97), August 1997, Los Angeles, Calif, USA 225–232.

    Google Scholar 

  7. 7.

    Porter T, Duff T: Compositing digital images. Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '84), July 1984, Minneapolis, Minn, USA 253–259.

    Google Scholar 

  8. 8.

    Yewdall DL: Practical Art of Motion Picture Sound. 2nd edition. Focal Press, Boston, Mass, USA; 2003.

    Google Scholar 

  9. 9.

    Streicher R: The decca tree - it's not just for stereo anymore. https://doi.org/www.wesdooley.com/pdf/Surround_Sound_Decca_Tree-urtext.pdf

  10. 10.

    Streicher R, Everest FA (Eds): The New Stereo Soundbook. 2nd edition. Audio Engineering Associate, Pasadena, Calif, USA; 1998.

    Google Scholar 

  11. 11.

    Daniel J, Rault J-B, Polack J-D: Ambisonics encoding of other audio formats for multiple listening conditions. Proceedings of the 105th Convention of the Audio Engineering Society, September 1998, San Francisco, Calif, USA preprint 4795

    Google Scholar 

  12. 12.

    Gerzon MA: Ambisonics in multichannel broadcasting and video. Journal of the Audio Engineering Society 1985,33(11):859-871.

    Google Scholar 

  13. 13.

    Leese MJ: Ambisonic surround sound FAQ (version 2.8). 1998.https://doi.org/members.tripod.com/martin_leese/Ambisonic/

    Google Scholar 

  14. 14.

    Merimaa J: Applications of a 3-D microphone array. 112th AES Convention, May 2002, Munich, Germany preprint 5501

    Google Scholar 

  15. 15.

    Laborie A, Bruno R, Montoya S: A new comprehensive approach of surround sound recording. Proceedings of the 114th Convention of the Audio Engineering Society, March 2003, Amsterdam, The Netherlands preprint 5717

    Google Scholar 

  16. 16.

    Jot J-M, Larcher V, Pernaux J-M: A comparative study of 3D audio encoding and rendering techniques. Proceedings of the AES 16th International Conference on Spatial Sound Reproduction, April 1999, Rovaniemi, Finland

    Google Scholar 

  17. 17.

    Abhayapala TD, Ward DB: Theory and design of high order sound field microphones using spherical microphone array. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1949–1952.

    Google Scholar 

  18. 18.

    Laborie A, Bruno R, Montoya S: High spatial resolution multi-channel recording. Proceedings of the 116th Convention of the Audio Engineering Society, May 2004, Berlin, Germany preprint 6116

    Google Scholar 

  19. 19.

    Meyer J, Elko G: Spherical microphone arrays for 3D sound recording. In Audio Signal Processing for Next-Generation Multimedia Communication Systems. Edited by: (Arden) Huang Y, Benesty J. Kluwer Academic, Boston, Mass, USA; 2004. chapter 2

    Google Scholar 

  20. 20.

    Berkhout AJ, de Vries D, Vogel P: Acoustic control by wave field synthesis. Journal of the Acoustical Society of America 1993,93(5):2764-2778. 10.1121/1.405852

    Article  Google Scholar 

  21. 21.

    Boone MM, Verheijen ENG, van Tol PF: Spatial sound-field reproduction by wave-field synthesis. Journal of the Audio Engineering Society 1995,43(12):1003-1012.

    Google Scholar 

  22. 22.

    Ajdler T, Vetterli M: The plenacoustic function and its sampling. Proceedings of the 1st IEEE Benelux Workshop on Model Based Processing and Coding of Audio (MPCA '02), November 2002, Leuven, Belgium

    Google Scholar 

  23. 23.

    Do MN: Toward sound-based synthesis: the far-field case. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 2: 601–604.

    Google Scholar 

  24. 24.

    Gortler SJ, Grzeszczuk R, Szeliski R, Cohen MF: The lumigraph. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 43–54.

    Google Scholar 

  25. 25.

    Levoy M, Hanrahan P: Light field rendering. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 31–42.

    Google Scholar 

  26. 26.

    Horbach U, Karamustafaoglu A, Pellegrini R, Mackensen P, Theile G: Design and applications of a data-based auralization system for surround sound. Proceedings of the 106th Convention of the Audio Engineering Society, May 1999, Munich, Germany preprint 4976

    Google Scholar 

  27. 27.

    Pellegrini RS: Comparison of data and model-based simulation algorithms for auditory virtual environments. 106th Convention of the Audio Engineering Society, May 1999, Munich, Germany preprint 4953

    Google Scholar 

  28. 28.

    Bregman AS: Auditory Scene Analysis, The Perceptual Organization of Sound. MIT Press, Cambridge, Mass, USA; 1990.

    Google Scholar 

  29. 29.

    Baumgarte F, Faller C: Binaural cue coding—part I: psychoacoustic fundamentals and design principles. IEEE Transactions on Speech and Audio Processing 2003,11(6):509-519. 10.1109/TSA.2003.818109

    Article  Google Scholar 

  30. 30.

    Faller C, Baumgarte F: Binaural cue coding—part II: schemes and applications. IEEE Transactions on Speech and Audio Processing 2003,11(6):520-531. 10.1109/TSA.2003.818108

    Article  Google Scholar 

  31. 31.

    Merimaa J, Pulkki V: Spatial impulse response rendering. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 139–144.

    Google Scholar 

  32. 32.

    Pulkki V: Directional audio coding in spatial sound reproduction and stereo upmixing. Proceedings of the 28th AES International Conference, June 2006, Pitea, Sweden

    Google Scholar 

  33. 33.

    O'Grady PD, Pearlmutter BA, Rickard ST: Survey of sparse and non-sparse methods in source separation. International Journal of Imaging Systems and Technology 2005,15(1):18-33. 10.1002/ima.20035

    Article  Google Scholar 

  34. 34.

    Vincent E, Rodet X, Röbel A, et al.: A tentative typology of audio source separation tasks. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan 715–720.

    Google Scholar 

  35. 35.

    Rickard S: Sparse sources are separated sources. Proceedings of the 14th Annual European Signal Processing Conference, September 2006, Florence, Italy

    Google Scholar 

  36. 36.

    Lewicki MS: Efficient coding of natural sounds. Nature Neuroscience 2002,5(4):356-363. 10.1038/nn831

    Article  Google Scholar 

  37. 37.

    Comon P: Independent component analysis. A new concept? Signal Processing 1994,36(3):287-314. 10.1016/0165-1684(94)90029-9

    MATH  Article  Google Scholar 

  38. 38.

    Sawada H, Araki S, Mukai R, Makino S: Blind extraction of dominant target sources using ICA and time-frequency masking. IEEE Transactions on Audio, Speech and Language Processing 2006,14(6):2165-2173.

    Article  Google Scholar 

  39. 39.

    Jourjine A, Rickard S, Yilmaz O: Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 5: 2985–2988.

    Google Scholar 

  40. 40.

    Yilmaz Ö, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004,52(7):1830-1847. 10.1109/TSP.2004.828896

    MathSciNet  MATH  Article  Google Scholar 

  41. 41.

    Avendano C: Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '03), October 2003, New Paltz, NY, USA 55–58.

    Google Scholar 

  42. 42.

    Radke R, Rickard S: Audio interpolation. Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio (AES22 '02), June 2002, Espoo, Finland 51–57.

    Google Scholar 

  43. 43.

    Moses RL, Krishnamurthy D, Patterson R: An auto-calibration method for unattended ground sensors. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 3: 2941–2944.

    Google Scholar 

  44. 44.

    Faugeras O: Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge, Mass, USA; 1993.

    Google Scholar 

  45. 45.

    Moore BCJ: An Introduction to the Psychology of Hearing. 4th edition. Academic Press, New York, NY, USA; 1997.

    Google Scholar 

  46. 46.

    Aarabi P: The fusion of distributed microphone arrays for sound localization. EURASIP Journal on Applied Signal Processing 2003,2003(4):338-347. 10.1155/S1110865703212014

    MATH  Google Scholar 

  47. 47.

    (Arden) Huang Y, Benesty J, Elko GW: Microphone arrays for video camera steering. In Acoustic Signal Processing for Telecommunication. Kluwer Academic, Boston, Mass, USA; 2000:239-259. chapter 11

    Google Scholar 

  48. 48.

    Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976,24(4):320-327. 10.1109/TASSP.1976.1162830

    Article  Google Scholar 

  49. 49.

    Krim H, Viberg M: Two decades of array signal processing research: the parametric approach. IEEE Signal Processing Magazine 1996,13(4):67-94. 10.1109/79.526899

    Article  Google Scholar 

  50. 50.

    Schmidt RO: Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation 1986,34(3):276-280. 10.1109/TAP.1986.1143830

    Article  Google Scholar 

  51. 51.

    Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359-370. 10.1155/S1110865703212038

    MATH  Google Scholar 

  52. 52.

    DiBiase JH, Silverman HF, Branstein MS: Microphone Arrays, Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 8

    Google Scholar 

  53. 53.

    Mungamuru B, Aarabi P: Enhanced sound localization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 2004,34(3):1526-1540. 10.1109/TSMCB.2004.826398

    Article  Google Scholar 

  54. 54.

    Chen J, Benesty J, (Arden) Huang Y: Time delay estimation in room acoustic environments: an overview. EURASIP Journal on Applied Signal Processing 2006, 2006: 19 pages.

    MATH  Google Scholar 

  55. 55.

    Rabinkin DV, Renomeron RJ, French JC, Flanagan JL: Estimation of wavefront arrival delay using the cross-power spectrum phase technique. 132nd Meeting of the Acoustical Society of America, December 1996, Honolulu, Hawaii, USA

    Google Scholar 

  56. 56.

    Chen J, Benesty J, (Arden) Huang Y: Performance of GCC- and AMDF-based time-delay estimation in practical reverberant environments. EURASIP Journal on Applied Signal Processing 2005,2005(1):25-36. 10.1155/ASP.2005.25

    MATH  Google Scholar 

  57. 57.

    Rui Y, Florencio D: New direct approaches to robust sound source localization. Proceedings of International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 737–740.

    Google Scholar 

  58. 58.

    Ajdler T, Kozintsev I, Lienhart R, Vetterli M: Acoustic source localization in distributed sensor networks. Proceedings of the 38th Asilomar Conference on Signals, Systems and Computers, November 2004, Pacific Grove, Calif, USA 2: 1328–1332.

    Google Scholar 

  59. 59.

    Samet H: The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass, USA; 1990.

    Google Scholar 

  60. 60.

    Tsingos N, Gallo E, Drettakis G: Perceptual audio rendering of complex virtual environments. ACM Transactions on Graphics 2004,23(3):249-258. Proceedings of SIGGRAPH 2004 10.1145/1015706.1015710

    Article  Google Scholar 

  61. 61.

    Kalman RE: A new approach to linear filtering and prediction problems. Transactions of the ASME - Journal of Basic Engineering 1960, 82: 35–45. 10.1115/1.3662552

    MathSciNet  Article  Google Scholar 

  62. 62.

    Malham DG: Spherical harmonic coding of sound objects - the ambisonic 'O' format. Proceedings of the 19th AES International Conference, Surround Sound—Techniques, Technology, and Perception, June 2001, Schloss Elmau, Germany 54–57.

    Google Scholar 

  63. 63.

    Tsingos N, Gascuel J-D: Fast rendering of sound occlusion and diffraction effects for virtual acoustic environments. Proceedings of the 104th Audio Engineering Society Convention, May 1998, Amsterdam, The Netherlands preprint 4699

    Google Scholar 

  64. 64.

    Baskind A, Warusfel O: Methods for blind computational estimation of perceptual attributes of room acoustics. Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio, June 2002, Espoo, Finland 402–411.

    Google Scholar 

  65. 65.

    Rickard S, Yilmaz O: On the approximate W-disjoint orthogonality of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529–532.

    Google Scholar 

  66. 66.

    Lewicki MS, Sejnowski TJ: Learning overcomplete representations. Neural Computation 2000,12(2):337-365. 10.1162/089976600300015826

    Article  Google Scholar 

  67. 67.

    Mallat SG, Zhang Z: Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 1993,41(12):3397-3415. 10.1109/78.258082

    MATH  Article  Google Scholar 

  68. 68.

    Slaney M, Covell M, Lassiter B: Automatic audio morphing. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 1001–1004.

    Google Scholar 

  69. 69.

    Faller C, Merimaa J: Source localization in complex listening situations: selection of binaural cues based on interaural coherence. Journal of the Acoustical Society of America 2004,116(5):3075-3089. 10.1121/1.1791872

    Article  Google Scholar 

  70. 70.

    Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453

    Article  Google Scholar 

  71. 71.

    Huang G, Yang L, He Z: Multiple acoustic sources location based on blind source separation. Proceedings of the 1st International Conference on Natural Computation (ICNC '05), August 2005, Changsha, China 683–687.

    Google Scholar 

  72. 72.

    Saruwatari H, Kurita S, Takeda K, Itakura F, Nishikawa T, Shikano K: Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Applied Signal Processing 2003,2003(11):1135–1146. 10.1155/S1110865703305104

    MATH  Google Scholar 

  73. 73.

    Wilson KW, Darell T: Learning a precedence effect-like weighting function for the generalized cross-correlation framework. IEEE Transactions on Audio, Speech and Language Processing 2006,14(6):2156-2164.

    Article  Google Scholar 

  74. 74.

    Lu L, Wenyin L, Zhang H-J: Audio textures: theory and applications. IEEE Transactions on Speech and Audio Processing 2004,12(2):156-167. 10.1109/TSA.2003.819947

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Emmanuel Gallo.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Gallo, E., Tsingos, N. & Lemaitre, G. 3D-Audio Matting, Postediting, and Rerendering from Field Recordings. EURASIP J. Adv. Signal Process. 2007, 047970 (2007). https://doi.org/10.1155/2007/47970

Download citation

Keywords

  • Augmented Reality
  • Sound Source
  • Localization Algorithm
  • Sound Field
  • Field Recording