Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units

Nagarajan, T.; Murthy, H.A.

doi:10.1155/S1110865704406210

Research Article
Published: 27 December 2004

Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units

T. Nagarajan¹ &
H.A. Murthy¹

EURASIP Journal on Advances in Signal Processing volume 2004, Article number: 410910 (2004) Cite this article

985 Accesses
21 Citations
3 Altmetric
Metrics details

Abstract

In the development of a syllable-centric automatic speech recognition (ASR) system, segmentation of the acoustic signal into syllabic units is an important stage. Although the short-term energy (STE) function contains useful information about syllable segment boundaries, it has to be processed before segment boundaries can be extracted. This paper presents a subband-based group delay approach to segment spontaneous speech into syllable-like units. This technique exploits the additive property of the Fourier transform phase and the deconvolution property of the cepstrum to smooth the STE function of the speech signal and make it suitable for syllable boundary detection. By treating the STE function as a magnitude spectrum of an arbitrary signal, a minimum-phase group delay function is derived. This group delay function is found to be a better representative of the STE function for syllable boundary detection. Although the group delay function derived from the STE function of the speech signal contains segment boundaries, the boundaries are difficult to determine in the context of long silences, semivowels, and fricatives. In this paper, these issues are specifically addressed and algorithms are developed to improve the segmentation performance. The speech signal is first passed through a bank of three filters, corresponding to three different spectral bands. The STE functions of these signals are computed. Using these three STE functions, three minimum-phase group delay functions are derived. By combining the evidence derived from these group delay functions, the syllable boundaries are detected. Further, a multiresolution-based technique is presented to overcome the problem of shift in segment boundaries during smoothing. Experiments carried out on the Switchboard and OGI-MLTS corpora show that the error in segmentation is at most 25 milliseconds for 67% and 76.6% of the syllable segments, respectively.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, 600036, India
T. Nagarajan & H.A. Murthy

Authors

T. Nagarajan
View author publications
You can also search for this author in PubMed Google Scholar
H.A. Murthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. Nagarajan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nagarajan, T., Murthy, H. Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units. EURASIP J. Adv. Signal Process. 2004, 410910 (2004). https://doi.org/10.1155/S1110865704406210

Download citation

Received: 16 January 2004
Revised: 17 June 2004
Published: 27 December 2004
DOI: https://doi.org/10.1155/S1110865704406210

Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units

Abstract

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases