An FPGA-Based Electronic Cochlea

A module generator which can produce an FPGA-based implementation of an electronic cochlea ﬁlter with arbitrary precision is presented. Although hardware implementations of electronic cochlea models have traditionally used analog VLSI as the implementation medium due to their small area, high speed, and low power consumption, FPGA-based implementations o ﬀ er shorter design times, improved dynamic range, higher accuracy, and a simpler computer interface. The tool presented takes ﬁlter coe ﬃ - cients as input and produces a synthesizable VHDL description of an application-optimized design as output. Furthermore, the tool can use simulation test vectors in order to determine the appropriate scaling of the ﬁxed-point precision parameters for each ﬁlter. The resulting model can be used as an accelerator for research in audition or as the front-end for embedded auditory signal processing systems. The application of this module generator to a real-time cochleagram display is also presented.


INTRODUCTION
The field of neuromorphic engineering has the long-term objective of taking architectures from our understanding of biological systems to develop novel signal processing systems.This field of research, pioneered by Mead [1], has concentrated on using analog VLSI to model biological systems.Research in this field has led to many biologically inspired signal processing systems which have improved performance compared to traditional systems.
The human cochlea is a transducer which converts mechanical vibrations from the middle ear into neural electrical discharges, and additionally provides spatial separation of frequency information in a manner similar to that of a spectrum analyzer [2].It serves as the front-end signal processing for all functions of the auditory nervous system such as auditory localization, pitch detection, and speech recognition.
Although it is possible to simulate cochlea models in software, hardware implementations may have orders of magnitude of improvement in performance.Hardware implementations are also attractive when the target applications are on embedded devices in which power efficiency and smallfootprint are design considerations.The electronic cochlea, first proposed by Lyon and Mead [2], is a cascade of biquadratic filter sections (as shown in Figure 1) which mimics the qualitative behavior of the human cochlea.Electronic cochlea have been successfully used in auditory signal processing systems such as spatial localization [3], pitch detection [4], a computer peripheral [5], amplitude modulation detection [6], correlation [7], and speech recognition [8].
There have been several previous implementations of electronic cochlea in analog VLSI technology.The original implementation by Lyon and Mead was published in 1988 and used continuous time subthreshold transconductance circuits to implement a cascade of 480 stages [2,9].In 1992, Watts et al. reported a 50-stage version with improved dynamic range, stability, matching, and compactness [10].A problem with analog implementations is that transistor matching issues affect the stability, accuracy, and size of the filters.This issue was addressed by van Schaik et al. in 1997 using compatible lateral bipolar transistors instead of MOSFETs in parts of the circuit [11].Their 104-stage test chip showed greatly improved characteristics.In addition, a switched capacitor cochlea filter was proposed by Bor et al. in 1996 [12].
There have also been several previously reported digital VLSI cochlea implementations.In 1992, Summerfield and Lyon reported an application-specific integrated circuit (ASIC) implementation which employed bit-serial secondorder filters [13].In 1997, Lim et al. reported a VHDLbased pitch detection system which used first-order Butterworth band pass filters for cochlea filtering [14].Later in 1998, Brucke et al. designed a VLSI implementation of a speech preprocessor which used gammatone filter banks to mimic the cochlea [15].The implementation by Brucke et al. used fixed-point arithmetic and they also explored trade-offs between wordlength and precision.In 2000, Watts built a 240-tap high-resolution implementation of a cochlea model using FPGA technology (http://www.lloydwatts.com/neuroscience.shtml)and in 2002 a tenth-order recursive cochlea filter was implemented using FPGA technology [16].
A field programmable gate array (FPGA) is an array of logic gates in which the connections can be configured by downloading a bitstream into its memory.Traditional ASIC design requires weeks or months for the fabrication process, whereas an FPGA can be configured in milliseconds.An additional advantage of FPGA technology is that the same devices can be reconfigured to perform different functions.At the time of writing this paper in 2002, FPGAs had equivalent densities of ten million system gates.
Since most systems which employ an electronic cochlea are experimental in nature, the long design and fabrication times associated with both analog and digital VLSI technology are a major shortcoming.Recently, FPGA technology has improved in density to the point where it is possible to develop large scale neuromorphic systems on a single FPGA.Although these are admittedly larger in area, have higher power consumption, and may have lower throughput than the more customized analog VLSI implementations, many interesting neuromorphic signal processing systems can be implemented using FPGA technology, enjoying the following advantages over analog and digital VLSI: (i) shorter design and fabrication time; (ii) more robust to power supply, temperature, and transistor mismatch variations than analog systems; (iii) arbitrarily high dynamic range and signal-to-noise ratios can be achieved over analog systems; (iv) whereas a VLSI design is usually tailored for a single application, the reconfigurability and reuseability of an FPGA enables the same system to be used for many applications; (v) designs can be optimized for each specific instance of a problem whereas ASICs need to be more general purpose; (vi) they can be interfaced more easily with a host computer.
The main difficulty that one faces in implementing an electronic cochlea on an FPGA is the choice of arithmetic system to be used in the imple mentation of the underlying filters.In the module generator which will be presented, a fixed-point implementation strategy was chosen over floating point since we believed it would result in an implementation with smaller area.Distributed arithmetic (DA) was used to implement the multipliers associated with the filters in an efficient manner.Finally, a module generator which can generate synthesizable VHDL descriptions of arbitrary wordlength fixed-point cochlea filters was developed.The module generator can also be used, together with our fp simulation tool [17,18], to determine the minimum and maximum ranges of all variables.This range information is then used to determine the maximal number of fractional bits which can be used in the variable's two's complement fraction representation, hence minimizing quantization error.
The FPGA implementation of the electronic cochlea described here can serve as a computational accelerator in its own right, or be used as a front-end preprocessing stage for embedded auditory applications.As a sample application, a real-time cochleagram display is presented.
The rest of the paper is organized as follows.In Section 2, Lyon and Mead's cochlea model is described.Section 3 describes the implementation of the filter stages using DA.Our design methodology is presented in Section 4, followed by results in Section 5. Conclusions are drawn in Section 6.

LYON AND MEAD'S COCHLEA MODEL
Lyon and Mead proposed the first electronic cochlea in 1988 [2,19].This model captured the qualitative behavior of the human cochlea using a simple cascade of second order filter stages which they implemented in analog VLSI.In this section, a very superficial summary of the Lyon and Mead cochlea model is given.More detailed descriptions of the cochlea can be found in [2,20].
The human cochlea, or inner ear, is a three-dimensional fluid dynamic system which converts mechanical vibrations from the middle ear into neural electrical discharges [2].It is composed of the basilar membrane, inner hair cells, and outer hair cells.The cochlea connects to higher levels in the auditory pathway for further processing.
The basilar membrane is a longitudinal membrane within the cochlea.The oval window provides the input to the cochlea.Vibrations of the eardrum are coupled via bones in the middle ear to the oval window causing a traveling wave from base to apex along the basilar membrane.The basilar membrane has a filtering action and can be thought of as ). a cascade of lowpass filters with exponentially decreasing cutoff frequency from base to apex.The result of the filtering of the basilar membrane at any point along its length is a bandpass filtered version of the input signal, with center frequency decreasing along its length.Different distances along the basilar membrane are tuned to specific frequencies in a manner similar to that of a spectrum analyzer.A simplified box model showing a sinusoidal wave traveling along an uncoiled cochlea is shown in Figure 2.
Several thousand inner hair cells are distributed along the basilar membrane and convert the displacement of the basilar membrane to a neural signal.The hair cells also perform a half-wave rectifying function since only displacements in one direction will cause neurons to fire.
The outer hair cells perform automatic gain control by changing the damping of the basilar membrane.It is interesting to note that there are approximately three times more outer hair cells than inner hair cells.
In order to simulate the properties of the basilar membrane, Lyon and Mead's cochlea model used a cascade of scaled second-order lowpass filters with the transfer function where Q represents the damping characteristic (or quality) of the filter and τ the time constant.In the cochlea filter, the τ of each filter is varied exponentially along the cascade, causing filters to have exponentially decreasing cutoff frequencies.
The Q of all the filters is held constant.The output of each filter corresponds to the displacement at different positions along the basilar membrane.

Distributed arithmetic
DA offers an efficient method to implement a sum of products (SOP) provided that one of the variables does not change during execution.Instead of requiring a multiplier, DA utilizes a precomputed lookup table [21,22].Consider the SOP, S of N terms where k i is the (fixed) weighting factor and x i is the input.For two's complement fractions, the numerical value of Substituting ( 3) into (2) yields The organization of the input variables is in a bit-serial least significant bit (LSB) first format.Since x i j ∈ {0, 1} (i = 0, 1, . . ., N−1, j = 0, 1, . . ., n−1), each term within the brackets of ( 4) is the sum of weighting factors k 0 , k 1 , . . ., k N−1 .On every clock cycle, one of the bracketed terms of S can thus be computed by applying x 0 , x 1 , . . ., x N−1 as the address inputs of a 2 (N−1) entry read-only memory (ROM).The contents of the ROM are precomputed from the constant k i 's and are shown in Table 1.The output of the ROM is multiplied by a power of two (a shift operation) and then accumulated.After n cycles, the accumulator contains the value of S.

Digital IIR filters
A general IIR second-order filter has a transfer function of the form The corresponding time domain IIR filter can be implemented by the function where x(n − k) is the kth previous input, y(n − k) is the kth previous output, and y(n) is the output.The operation is essentially the SOP of five terms, and can be directly mapped to a biquadratic section as shown in Figure 3. Figure 4 illustrates our actual implementation using DA (described in Section 3.1) on an Xilinx Virtex FPGA.The previous values x(n − 1), x(n − 2), and y(n − 2) are implemented using shift registers with the number of stages equal to the wordlength of the variables used.The shift registers are implemented by cascades of Virtex SRL16E primitives for minimum area.The DA ROM takes x(n), x(n − 1), x(n − 2), y(n − 1), and y(n − 2) as inputs to generate partial sums (bracketed terms in (4)).As there are 5 inputs, the required number of entries in the ROM is 2 5 = 32, leading to an efficient implementation using Xilinx ROM32X1 primitives.The scaling accumulator shifts and adds the output from the ROM (unscaled partial sum in bitparallel organization) at every cycle to produce y(n).In the last cycle of scaling and accumulation, the parallel-to-serial converter latches the value at the scaling accumulator.Since the scaling accumulator has a latency equal to the wordlength of the variables, the value latched by the converter is y(n − 1).

DESIGN METHODOLOGY
Given the filter coefficients, the designer selects appropriate values of filter wordlength and the number of bits (width) of the DA ROM's output.Note that all filter sections have the same wordlength although the allocation of integer and fractional parts used within each filter section can vary.
The cochlea filter model is written in a subset of C which supports only expressions and assignments [17,18].A compiler uses standard parsing techniques to translate expressions into directed acyclic graphs (DAG).Each operator is mapped to a module which is a software object consisting of a set of parameters, a simulator, and a component generator.The simulator can perform the operation at a requested precision to determine range information.It can also compare fixed-point output with a floating-point computation to derive error statistics.
As an input, the fp cochlea generator takes the coefficients obtained from an auditory toolbox, the wordlength of variables, and the width of the DA ROM.Although inputs and outputs of all filter sections are of the same wordlength, their fractional wordlength can be different (two's complement fractions are used).The dynamic ranges of inputs and outputs are determined by fp through simulation of a set of user-supplied test vectors.The generator performs simulation using the test vectors as inputs and the range of each variable can be determined.From this information, the minimum number of bits needed for the integer part of each variable is known and, since the wordlength is fixed, the maximum number of bits can be assigned to the fractional part of the variable.
After deducing the best representation for each variable, the generator outputs a synthesizable VHDL code that describes an implementation of the corresponding cochlea model.The fractional wordlengths of the scaling accumulator and the output variable can be different, so the operator    must also include a mechanism to convert the former to the latter.Since the output of the scaling accumulator is bitparallel while the output variable is bit-serial, the parallel-toserial converter can perform format scaling by selecting the appropriate bits to serialize.The resulting VHDL description can then be used as a core in other designs.
The high level cochlea model description is approximately 60 lines of C code.From that, it generates approximately 50000 lines of VHDL code for the case of a cochlea filter with 88 biquadratic sections.

RESULTS
The cochlea implementation was tested on an Annapolis "Wildstar" Reconfigurable Computing Engine [23] which is a PCI-based reconfigurable computing platform containing three Xilinx Virtex XCV1000-BG560-6 FPGAs.The cochlea implementations were verified by comparing Synopsys VHDL Simulator simulations with the results produced by a floating-point software model.Synthesis and implemen-tation were performed using Synopsys FPGA Express 3.4 and Xilinx Foundation 3.2i, respectively.

Trade-offs among wordlength, width of DA ROM, and precision
The coefficients for the biquadratic filters in our implementation of Lyon and Mead's cochlea model were obtained using Slaney's Auditory Toolbox [24].This Matlab toolbox has several different cochlea models, test inputs, and visualization tools.The same toolbox was used to verify our designs and produce cochleagram plots.The coefficients of these implementations were obtained from the Auditory Toolbox using the Matlab command DesignLyonFilters(16000, 8, 0.25), which specifies a 16 kHz sampling rate, Q = 8, and a spacing which gives 88 biquadratic filters.A series of cochlea implementations, with wordlengths from 10 to 32 bits and DA ROM width from 10 to 24 bits, was generated in order to present the trade-offs among wordlengths, widths of DA ROMs, and precisions.
In order to present the improvement in precision with increasing wordlengths and ROM width, the frequency responses of several different fixed-point implementations are plotted in Figure 5. Figure 6 shows impulse and frequency responses obtained from a software floating-point implementation, a hardware 16-bit wordlength, and 16-bit ROM width implementation.
It can be observed that the filter accuracy gradually improves with increasing wordlength or ROM width.When    7 shows the trend of improved quantization error with increasing wordlength and ROM width.Area requirements, maximum clock rates, and maximum sampling rates of these implementations on a Xilinx Virtex XCV1000-6 FPGA, as reported by the Xilinx implementation tools, are shown in Tables 2 and 3. A Xilinx XCV1000 has 12288 slices and the largest currently available parts, XCV3200, have 32448 slices.As a bit-serial architecture was employed, the effective sampling rates of the implementations are their maximum clock rates divided by their wordlengths.

Application to a cochleagram display
A 24-bit wordlength, 16-bit DA ROM implementation was used to construct a cochleagram display application.Due to limited hardware resources on a Xilinx XCV1000-6 FPGA, only the first 60 out of the 88 cochlea sections were used.These cochlea sections correspond to a frequency range of 1006 to 7630 Hz.
The design of the cochleagram display is shown in Figure 8.The host PC writes input data into a dual-port block RAM (256 × 32-bit synchronous RAM) which passes through a parallel-to-serial converter and enters the cochlea core.Each of the outputs of the cochlea core undergoes serial-to-parallel conversion followed by half-wave rectification (to model the functionality of the inner hair cells).The outputs were integrated over 256 samples and sent to the PC for display.
The cochleagram display was tested with several different inputs.Figure 9 shows the cochleagrams produced from swept-sine wave and the auditory toolbox's "tapestry" inputs; the former is a 25-second linear chirp and the latter is the speech file of a woman saying "a huge tapestry hung in her hallway."The cochleagram display requires 10344 slices and can be clocked at 44.15 MHz, yielding a sampling rate of 1.84 MHz (or 115 times faster than real-time performance).Including software and interfacing overheads, the measured throughput on the "Wildstar" platform was 238 kHz.As a comparison, the auditory toolbox achieves a 64 kHz throughput on a Sun Ultra-5 360 MHz machine.The performance could be further improved by using large and/or faster speed grade FPGAs, or via improved floorplanning of the design which would allow a higher clock frequency.
It is interesting to compare the FPGA-based cochleagram system with a similar system developed in analog VLSI by Lazzaro et al. in 1994 [5].Using a 2 µm CMOS process, they integrated a 119 stage silicon cochlea (with a slightly more sophisticated hair cell model), nonvolatile analog storage, and a sophisticated event-based communications protocol on a single 3.6 × 6.8 mm 2 chip with a power consumption of 5 mW.The analog VLSI version has improved density and power consumption compared with the FPGA approach.However, the FPGA version is vastly simpler, easier to modify, has a shorter design time, and is much more tolerant of supply voltage, temperature, and transistor matching variations.Although qualitative results are not available, it is expected that the FPGA version also has better filter accuracy, has a wider dynamic range and can operate at higher Q without instability.
We believe that there are many applications of the FPGA cochlea including the development of more refined cochlea or cochlea-like models.An FPGA cochlea is particularly suited as a testbed for algorithms that involve concurrent processing across cochlea channels such as (i) more realistic hair cell models, (ii) auditory streaming and the separation of foreground stimuli from background noise, (iii) auditory processing in reverberant environments, (iv) human sound localization, and (v) bat echolocation.In addition, the FPGA platform provides an avenue for developing, simulating, and studying auditory processing in more complicated, but realistic acoustic environments, that involve multiple sound sources, multiple reflection paths, and external ear acoustic filtering that varies with sound direction.The signal processing required to simulate such realistic environments is computationally intensive and some of this preprocessing can be incorporated into the FPGA platform enabling real-time studies of auditory processing under realistic acoustic conditions.We are also interested in finding ways in which FPGA cochlea models can assist in adapting or translating cochlea processing principles into engineering implementations.Future projects include using the FPGA cochlea in comparisons of a cochlea model with an analysis-synthesis filter bank as used in perceptual audio coding, audio visualization displays, and a neuromorphic isolated word spotting system with cochlea preprocessing.Although modern digital signal processors (DSP) are capable of achieving similar or even higher performance, the FPGA may have advantages in terms of power consumption and smaller footprint.The availability of more than 100 dedicated high-speed multipliers in newer Virtex II [25] devices would enable implementations with much higher throughput than the implementation presented in this paper and would also free up more FPGA logic resources for implementing hair cell and higher-level processing models.

CONCLUSION
A parameterized FPGA implementation of an electronic cochlea that can be used as a building block for many systems which model the human auditory pathway was developed.This electronic cochlea demonstrates the feasibility of incorporating large neuromorphic systems on FPGA devices.Neuromorphic systems employ parallel distributed processing which is well suited to FPGA implementation, and may offer significant advantages over conventional architectures.
FPGAs provide a very flexible platform for the development of experimental neuromorphic circuits and offer advantages in terms of faster design time, faster fabrication time, wider dynamic range, better stability, and simpler computer interface over analog VLSI implementations.

Figure 1 :
Figure 1: Cascaded IIR biquadratic section used in the Lyon and Mead cochlea model.

Figure 2 :
Figure 2: Illustration of a sine wave travelling through a simplified box model of an uncoiled cochlea (adapted from [2]).

Figure 3 :
Figure 3: The architecture of an IIR biquadratic section.

Figure 4 :
Figure 4: Implementation of an IIR biquadratic section on an Xilinx Virtex FPGA.

Figure 5 :
Figure 5: Frequency responses of cochlea implementations with different wordlength and width of ROMs (wordlength, ROM width).

Figure 6 :Figure 7 :
Figure 6: Impulse response of (a) the software floating-point implementation and (b) hardware 16-bit wordlength, 16-bit ROM width implementation.Frequency response of (c) the software floating-point implementation and (d) hardware 16-bit wordlength, 16-bit ROM width implementation.

Figure 8 :
Figure 8: System architecture of the cochleagram display.

Figure 9 :
Figure 9: Cochleagrams of (a) swept-sine wave and (b) "tapestry" inputs.The former has 400000 samples while the latter has 50380 samples.Only the first 60 out of the 88 cochlea sections were used because of limited hardware resources on a Xilinx XCV1000-6 FPGA.These cochlea sections correspond to a frequency range of 1006-7630 Hz.

Table 1 :
Contents of a DA ROM.For each address, the terms k i for which

Table 2 :
Area requirements of an 88-section cochlea implementation of different wordlengths and ROM width (number of slices).

Table 3 :
Maximum clock rates and corresponding sampling rates of 88-section cochlea implementations of different wordlengths and ROM width (maximum clock rate (MHz) and maximum sampling rate (MHz)).