- Research Article
- Open Access
Bandwidth Extension of Telephone Speech Aided by Data Embedding
EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 064921 (2006)
A system for bandwidth extension of telephone speech, aided by data embedding, is presented. The proposed system uses the transmitted analog narrowband speech signal as a carrier of the side information needed to carry out the bandwidth extension. The upper band of the wideband speech is reconstructed at the receiving end from two components: a synthetic wideband excitation signal, generated from the narrowband telephone speech and a wideband spectral envelope, parametrically represented and transmitted as embedded data in the telephone speech. We propose a novel data embedding scheme, in which the scalar Costa scheme is combined with an auditory masking model allowing high rate transparent embedding, while maintaining a low bit error rate. The signal is transformed to the frequency domain via the discrete Hartley transform (DHT) and is partitioned into subbands. Data is embedded in an adaptively chosen subset of subbands by modifying the DHT coefficients. In our simulations, high quality wideband speech was obtained from speech transmitted over a telephone line (characterized by spectral magnitude distortion, dispersion, and noise), in which side information data is transparently embedded at the rate of 600 information bits/second and with a bit error rate of approximately. In a listening test, the reconstructed wideband speech was preferred (at different degrees) over conventional telephone speech in of the test utterances.
Voran S: Listener ratings of speech passbands. Proceedings of the IEEE Workshop on Speech Coding for Telecommunications, 1997, Pocono Manor, Pa, USA 81–82.
Jax P, Vary P: Bandwidth extension of speech signals: a catalyst for the introduction of wideband speech coding? IEEE Communications Magazine 2006,44(5):106–111.
Eggers JJ, Bäuml R, Tzschoppe R, Girod B: Scalar Costa scheme for information embedding. IEEE Transactions on Signal Processing 2003,51(4):1003–1019. 10.1109/TSP.2003.809366
Costa MHM: Writing on dirty paper. IEEE Transactions on Information Theory 1983,29(3):439–441. 10.1109/TIT.1983.1056659
Cheng Q, Sorensen J: Spread spectrum signaling for speech watermarking. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '01), May 2001, Salt Lake, Utah, USA 3: 1337–1340.
Swanson MD, Zhu B, Tewfik AH, Boney L: Robust audio watermarking using perceptual masking. Signal Processing 1998,66(3):337–355. 10.1016/S0165-1684(98)00014-0
Chen B, Wornell GW: Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Transactions on Information Theory 2001,47(4):1423–1443. 10.1109/18.923725
Geiser B, Jax P, Vary P: Artificial bandwidth extension of speech supported by watermark-transmitted side information. Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH '05), September 2005, Lisbon, Portugal 1497–1500.
Sagi A, Malah D: Data embedding in speech signals using perceptual masking. European Signal Processing Conference, September 2004, Vienna, Austria 1657–1660.
Chen S, Leung H: Concurrent data transmission through analog speech channel using data hiding. IEEE Signal Processing Letters 2005,12(8):581–584.
Larsen E, Aarts RM: Audio Bandwidth Extension. John Wiley & Sons, New York, NY, USA; 2004.
Fuemmeler JA, Hardie RC, Gardner WR: Techniques for the regeneration of wideband speech from narrowband speech. EURASIP Journal on Applied Signal Processing 2001,2001(4):266–274. 10.1155/S1110865701000300
Makhoul J: Linear prediction: a tutorial review. Proceedings of the IEEE 1975,63(4):561–580.
Jax P, Vary P: On artificial bandwidth extension of telephone speech. Signal Processing 2003,83(8):1707–1719. 10.1016/S0165-1684(03)00082-3
McCree A: 14 kb/s wideband speech coder with a parametric highband model. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 2: 1153–1156.
McCree A, Unno T, Anandakumar A, Bernard A, Paksoy E: An embedded adaptive multi-rate wideband speech coder. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '01), May 2001, Salt Lake, Utah, USA 2: 761–764.
Valin J-M, Lefebvre R: Bandwidth extension of narrowband speech for low bit-rate wideband coding. Proceedings of the IEEE Speech Coding Workshop (SCW '00), September 2000, Delavan, Wis, USA 130–132.
Epps JR, Holmes WH: A new very low bit rate wideband speech coder with a sinusoidal highband model. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '01), May 2001, Sydney, NSW, Australia 2: 349–352.
Makhoul J: Spectral linear prediction: properties and applications. IEEE Transactions on Acoustics, Speech, and Signal Processing 1975,23(3):283–297. 10.1109/TASSP.1975.1162685
Kondoz AM: Digital Speech: Coding for Low Bit Rate Communications Systems. John Wiley & Sons, New York, NY, USA; 1994.
Makhoul J, Berouti M: High-frequency regeneration in speech coding systems. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '79), April 1979, Washington, DC, USA 4: 428–431.
Linde Y, Buzo A, Gray RM: An algorithm for vector quantizer design. IEEE Transactions on Communications Systems 1980,28(1):84–95. 10.1109/TCOM.1980.1094577
ISO/IEC : Information technology—coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s—part 3: audio. In Tech. Rep. ISO/IEC 11172-3. International Organization for Standardization, Geneva, Switzerland; 1992.
Bracewell RN: Discrete Hartley transform. Journal of Optical Society of America 1983,73(12):1832–1835. 10.1364/JOSA.73.001832
Sorensen HV, Jones DL, Burrus CS, Heideman MT: On computing the discrete Hartley transform. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(5):1231–1238. 10.1109/TASSP.1985.1164687
Haykin S: Adaptive Filter Theory. 3rd edition. Prentice-Hall, New York, NY, USA; 1996.
Haykin S: Communication Systems. 4th edition. John Wiley & Sons, New York, NY, USA; 2001.
Sklar B: Digital Communications, Fundamentals and Applications. Prentice-Hall, Englewood Cliffs, NJ, USA; 1988.
ITU-T : Netwrok transmission model for evaluating modem performance over 2-wire voice grade connections. In Tech. Rep. V.56 bis. International Telecommunication Union, Geneva, Switzerland; August 1995.
ITU-T : Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In Tech. Rep. P.862. International Telecommunication Union, Geneva, Switzerland; February 2001.
Sagi A: Data embedding in speech signals, M.S. thesis.
Fischer RFH, Tzschoppe R, Bäuml R: Lattice costa schemes using subspace projection for digital watermarking. Proceedings of the 5th International ITG Conference on Source and Channel Coding (SCC '04), January 2004, Erlangen, Germany 127–134.
About this article
Cite this article
Sagi, A., Malah, D. Bandwidth Extension of Telephone Speech Aided by Data Embedding. EURASIP J. Adv. Signal Process. 2007, 064921 (2006). https://doi.org/10.1155/2007/64921
- Side Information
- Listening Test
- Spectral Envelope
- Transmitted Analog
- Test Utterance