Skip to main content

A Bit Stream Scalable Speech/Audio Coder Combining Enhanced Regular Pulse Excitation and Parametric Coding

Abstract

This paper introduces a new audio and speech broadband coding technique based on the combination of a pulse excitation coder and a standardized parametric coder, namely, MPEG-4 high-quality parametric coder. After presenting a series of enhancements to regular pulse excitation (RPE) to make it suitable for the modeling of broadband signals, it is shown how pulse and parametric codings complement each other and how they can be merged to yield a layered bit stream scalable coder able to operate at different points in the quality bit rate plane. The performance of the proposed coder is evaluated in a listening test. The major result is that the extra functionality of the bit stream scalability does not come at the price of a reduced performance since the coder is competitive with standardized coders (MP3, AAC, SSC).

References

  1. 1.

    Noll P: MPEG digital audio coding. IEEE Signal Processing Magazine 1997,14(5):59-81. 10.1109/79.618009

    Article  Google Scholar 

  2. 2.

    Edler B, Purnhagen H, Ferekidis C: ASAC—Analysis/synthesis codec for very low bit rates. The 100th AES Convention, April 1996, Copenhagen, Denmark 4179.

    Google Scholar 

  3. 3.

    Serra X: Musical sound modeling with sinusoids plus noise. In Musical Signal Processing. Edited by: Roads C, Pope S, Picialli A, Poli GD. Swets & Zeitlinger, Lisse, The Netherlands; 1997:91-122.

    Google Scholar 

  4. 4.

    Audio Subgroup : Report on the verification test of MPEG-4 parametric coding for high-quality audio. 2004.

    Google Scholar 

  5. 5.

    van Schijndel NH, van de Par S: Rate-distortion optimized hybrid sound coding. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '05), October 2005, New Paltz, NY, USA 235–238.

    Google Scholar 

  6. 6.

    Myburg FP: Design of a scalable parametric audio coder, Ph.D. dissertation.

  7. 7.

    Verma TS, Meng THY: 6Kbps to 85Kbps scalable audio coder. Proceedings of IEEE Interntional Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 2: 877–880.

    Google Scholar 

  8. 8.

    Kleijn W, Paliwal KK (Eds): Speech Coding and Synthesis. Elsevier, Amsterdam, The Netherlands; 1995.

    Google Scholar 

  9. 9.

    Ramprashad SA: The multimode transform predictive coding paradigm. IEEE Transactions on Speech and Audio Processing 2003,11(2):117-129. 10.1109/TSA.2003.809195

    Article  Google Scholar 

  10. 10.

    Salami R, Lefebvre R, Lakaniemi A, Kontola K, Bruhn S, Taleb A: Extended AMR-WB for high-quality audio on mobile devices. IEEE Communications Magazine 2006,44(5):90-97.

    Article  Google Scholar 

  11. 11.

    Makhoul J: Linear prediction: a tutorial review. Proceedings of the IEEE 1975,63(4):561-580.

    Article  Google Scholar 

  12. 12.

    Markel JD, Gray AH Jr.: Linear Prediction of Speech. Springer, Berlin, Germany; 1976.

    Google Scholar 

  13. 13.

    Strube HW: Linear prediction on a warped frequency scale. The Journal of the Acoustical Society of America 1980,68(4):1071-1076. 10.1121/1.384992

    Article  Google Scholar 

  14. 14.

    den Brinker AC, Voitishchuk V, van Eijndhoven SJL: IIR-based pure linear prediction. IEEE Transactions on Speech and Audio Processing 2004,12(1):68-75. 10.1109/TSA.2003.815524

    Article  Google Scholar 

  15. 15.

    Härmä A, Karjalainen M, Savioja L, Välimäki V, Laine UK, Huopaniemi J: Frequency-warped signal processing for audio applications. Journal of the Audio Engineering Society 2000,48(11):1011-1031.

    Google Scholar 

  16. 16.

    Oppenheim AV, Johnson DH, Steiglitz K: Computation of spectra with unequal resolution using the fast Fourier transform. Proceedings of the IEEE 1971,59(2):299-301.

    Article  Google Scholar 

  17. 17.

    Kroon P, Deprettere E, Sluyter R: Regular-pulse excitation—a novel approach to effective and efficient multipulse coding of speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(5):1054-1063. 10.1109/TASSP.1986.1164946

    Article  Google Scholar 

  18. 18.

    Atal BS, Remde JR: A new model of LPC excitation for producing natural-sounding speech at low bit rates. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '82), May 1982, Paris, France 614–617.

    Google Scholar 

  19. 19.

    Schroeder MR, Atal BS: Code-excited linear prediction (CELP): high-quality speech at very low bit rates. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '85), April 1985, Tampa, Fla, USA 937–940.

    Google Scholar 

  20. 20.

    Adoul J-P, Mabilleau P, Delprat M, Morissette S: Fast CELP coding based on algebraic codes. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '87), April 1987, Dallas, Tex, USA 1957–1960.

    Google Scholar 

  21. 21.

    Cox RV, Kroon P: Low bit-rate speech coders for multimedia communication. IEEE Communications Magazine 1996,34(12):34-41. 10.1109/35.556484

    Article  Google Scholar 

  22. 22.

    Bessette B, Salami R, Lefebvre R, et al.: The adaptive multirate wideband speech codec (AMR-WB). IEEE Transactions on Speech and Audio Processing 2002,10(8):620-636. 10.1109/TSA.2002.804299

    Article  Google Scholar 

  23. 23.

    Salami RA: Binary code excited linear prediction (BCELP): new approach to CELP coding of speech without codebooks. Electronics Letters 1989,25(6):401-403. 10.1049/el:19890276

    Article  Google Scholar 

  24. 24.

    Xydeas CS, Ireton MA, Baghbadrani DK: Theory and real time implementation of a CELP coder at 4.8 & 6.0 kbits/sec using ternary code excitation. Proceedings of IERE International Conference on Digital Processing of Signals in Communications, September 1990, Loughborough, UK 167–174.

    Google Scholar 

  25. 25.

    Singhal S: High quality audio coding using multipulse LPC. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2: 1101–1104.

    Google Scholar 

  26. 26.

    Lin X, Salami RA, Steele R: High quality audio coding using analysis-by-synthesis technique. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '91), May 1991, Toronto, Ont, Canada 5: 3617–3620.

    Google Scholar 

  27. 27.

    Waters GT (Ed): Sound quality assessment material recordings for subjective tests In Users' handbook for the EBU—SQAM Compact Disk Tech. 3253-E. Technical Centre of the European Broadcasting Union, Brussels, Belgium; 1988.

  28. 28.

    Riera-Palou F, den Brinker AC, Gerrits AJ: Modelling long-term correlations in broadband speech and audio pulse coders. Electronics Letters 2005,41(8):508-509. 10.1049/el:20058338

    Article  Google Scholar 

  29. 29.

    Riera-Palou F, den Brinker AC, Gerrits AJ, Sluijter RJ: Improved optimisation of excitation sequences in speech and audio coders. Electronics Letters 2004,40(8):515-517. 10.1049/el:20040321

    Article  Google Scholar 

  30. 30.

    Ramachandran RP, Kabal P: Pitch prediction filters in speech coding. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(4):467-478. 10.1109/29.17527

    Article  Google Scholar 

  31. 31.

    Ramachandran RP, Kabal P: Stability and performance analysis of pitch filters in speech coders. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(7):937-946. 10.1109/TASSP.1987.1165238

    Article  Google Scholar 

  32. 32.

    Sreenivas TV: Modelling LPC-residue by components for good quality speech coding. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '88), April 1988, New York, NY, USA 171–174.

    Google Scholar 

  33. 33.

    Kroon P: Time-domain coding of (near) toll quality speech at rates below 16 kb/s, Ph.D. dissertation.

  34. 34.

    Kleijn WB, Krasinski DJ, Ketchum RH: Improved speech quality and efficient vector quantization in SELP. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '88), April 1988, New York, NY, USA 155–158.

    Google Scholar 

  35. 35.

    Zhang S, Lockhart G: Embedded RPE based on multistage coding. IEEE Transactions on Speech and Audio Processing 1997,5(4):367-371. 10.1109/89.593313

    Article  Google Scholar 

  36. 36.

    Schuijers EGP, Oomen AWJ, den Brinker AC, Gerrits AJ: Advances in parametric coding for high-quality audio. Proceedings of IEEE Benelux Workshop on Model Based Processing and Coding of Audio, November 2002, Leuven, Belgium 73–79.

    Google Scholar 

  37. 37.

    Johnson JD, Ferreira AJ: Sum-difference stereo transform coding. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '92), March 1992, San Francisco, Calif, USA 2: 569–572.

    Google Scholar 

  38. 38.

    Herre J, Brandenburg K, Lederer D: Intensity stereo coding. The 96th AES Convention, March 1994, Amsterdam, The Netherlands 3799.

    Google Scholar 

  39. 39.

    Breebaart J, van de Par S, Kohlrausch A, Schuijers E: High-quality parametric spatial audio coding at low bitrates. The 116th AES Convention, May 2004, Berlin, Germany 6072.

    Google Scholar 

  40. 40.

    Breebaart J, van de Par S, Kohlrausch A, Schuijers E: Parametric coding of stereo audio. EURASIP Journal on Applied Signal Processing 2005,2005(9):1305-1322. 10.1155/ASP.2005.1305

    MATH  Google Scholar 

  41. 41.

    ISO/IEC : Coding of audio-visual objects—part3: audio, AMENDMENT 2: parametric coding of high quality audio. 2004.

    Google Scholar 

  42. 42.

    Vaidyanathan PP: Multirate Systems And Filter Banks. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.

    Google Scholar 

  43. 43.

    Riera-Palou F, den Brinker AC, Gerrits AJ: A hybrid parametric-waveform approach to bit stream scalable audio coding. Proceedings of the 38th Asilomar Conference on Signals, Systems and Computers, November 2004, Pacific Grove, Calif, USA 2: 2250–2254.

    Google Scholar 

  44. 44.

    Amorim R: Results of 64kbps Public Listening Test. 2003.https://doi.org/www.rjamorim.com/test/64test/results.html

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Felip Riera-Palou.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Riera-Palou, F., den Brinker, A.C. A Bit Stream Scalable Speech/Audio Coder Combining Enhanced Regular Pulse Excitation and Parametric Coding. EURASIP J. Adv. Signal Process. 2007, 074064 (2007). https://doi.org/10.1155/2007/74064

Download citation

Keywords

  • Information Technology
  • Quantum Information
  • Major Result
  • Pulse Excitation
  • Standardize Coder