Skip to main content

Advertisement

You are viewing the new BMC article page. Let us know what you think. Return to old version

Research Article | Open | Published:

Efficient Algorithm and Architecture of Critical-Band Transform for Low-Power Speech Applications

Abstract

An efficient algorithm and its corresponding VLSI architecture for the critical-band transform (CBT) are developed to approximate the critical-band filtering of the human ear. The CBT consists of a constant-bandwidth transform in the lower frequency range and a Brown constant- transform (CQT) in the higher frequency range. The corresponding VLSI architecture is proposed to achieve significant power efficiency by reducing the computational complexity, using pipeline and parallel processing, and applying the supply voltage scaling technique. A 21-band Bark scale CBT processor with a sampling rate of 16 kHz is designed and simulated. Simulation results verify its suitability for performing short-time spectral analysis on speech. It has a better fitting on the human ear critical-band analysis, significantly fewer computations, and therefore is more energy-efficient than other methods. With a 0.35m CMOS technology, it calculates a 160-point speech in 4.99 milliseconds at 234 kHz. The power dissipation is 15.6W at 1.1 V. It achieves 82.1 power reduction as compared to a benchmark 256-point FFT processor.

References

  1. 1.

    Fletcher H: Auditory patterns. Reviews of Modern Physics 1940,12(1):47-65. 10.1103/RevModPhys.12.47

  2. 2.

    Zwicker E: Subdivision of the audible frequency range into critical bands (frequenzgruppen). The Journal of the Acoustical Society of America 1961,33(2):248. 10.1121/1.1908630

  3. 3.

    Picone JW: Signal modeling techniques in speech recognition. Proceedings of the IEEE 1993,81(9):1215-1247. 10.1109/5.237532

  4. 4.

    Dautrich BA, Rabiner LR, Martin TB: On the effects of varying filter bank parameters on isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 1983,31(4):793-807. 10.1109/TASSP.1983.1164172

  5. 5.

    Noll P: Digital audio coding for visual communications. Proceedings of the IEEE 1995,83(6):925-943. 10.1109/5.387093

  6. 6.

    Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357-366. 10.1109/TASSP.1980.1163420

  7. 7.

    Petersen TL, Boll SF: Critical band analysis-synthesis. IEEE Transactions on Acoustics, Speech, and Signal Processing 1983,31(3):656-663. 10.1109/TASSP.1983.1164127

  8. 8.

    Kates JM: An auditory spectral analysis model using the chirp z-transform. IEEE Transactions on Acoustics, Speech, and Signal Processing 1983,31(1):148-156. 10.1109/TASSP.1983.1164015

  9. 9.

    Carnero B, Drygajlo A: Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms. IEEE Transactions on Signal Processing 1999,47(6):1622-1635. 10.1109/78.765133

  10. 10.

    Farooq O, Datta S: Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters 2001,8(7):196-198. 10.1109/97.928676

  11. 11.

    Chandrakasan AP, Sheng S, Brodersen RW: Low power techniques for portable real-time DSP applications. Proceedings of the 5th International Conference on VLSI Design, January 1992, Bangalore, India 203–208.

  12. 12.

    Wang C, Tong Y-C: An improved critical-band transform processor for speech applications. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '04), May 2004, Vancouver, BC, Canada 3: 461–464.

  13. 13.

    Wang C, Tong Y-C, Shao Y: VLSI design and analysis of a critical-band transform processor for speech recognition. Proceedings of IEEE International SOC Conference, September 2004, Santa Clara, Calif, USA 365–368.

  14. 14.

    Brown JC: Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America 1991,89(1):425-434. 10.1121/1.400476

  15. 15.

    Rabiner L, Juang B: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ, USA; 1993.

  16. 16.

    Holmes JN, Holmes WJ: Speech Synthesis and Recognition. 2nd edition. Taylor & Francis, New York, NY, USA; 2001.

  17. 17.

    Chandrakasan AP, Sheng S, Brodersen RW: Low-power CMOS digital design. IEEE Journal of Solid-State Circuits 1992,27(4):473-484. 10.1109/4.126534

  18. 18.

    Bass BM: A low-power, high-performance, 1024-points FFT processor. IEEE Journal of Solid-State Circuits 1999,34(3):380-387. 10.1109/4.748190

  19. 19.

    Cetin E, Morling RCS, Kale I: An integrated 256-point complex FFT processor for real-time spectrum analysis and measurement. Proceedings of IEEE Instrumentation and Measurement Technology Conference, May 1997, Ottawa, ON, Canada 1: 96–101.

  20. 20.

    Ruetz PA, Cai MM: A real time FFT chip set: architectural issues. Proceedings of the 10th International Conference on Pattern Recognition, June 1990, Atlantic City, NJ, USA 2: 385–388.

  21. 21.

    Bidet E, Castelain D, Joanblanq C, Senn P: A fast single-chip implementation of 8192 complex point FFT. IEEE Journal of Solid-State Circuits 1995,30(3):300-305. 10.1109/4.364445

  22. 22.

    Liu Z, Song Y, Ikenaga T, Goto S: A VLSI array processing oriented fast Fourier transform algorithm and hardware implementation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2005,88(12):3523-3530.

  23. 23.

    Daubechies I, Sweldens W: Factoring wavelet transforms into lifting steps. Journal of Fourier Analysis and Applications 1998,4(3):247-269. 10.1007/BF02476026

Download references

Author information

Correspondence to Chao Wang.

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • Bark
  • Supply Voltage
  • Parallel Processing
  • Efficient Algorithm
  • Power Dissipation