The Personal Hearing System—A Software Hearing Aid for a Personal Communication System

A concept and architecture of a personal communication system (PCS) is introduced that integrates audio communication and hearing support for the elderly and hearing-impaired through a personal hearing system (PHS). The concept envisions a central processor connected to audio headsets via a wireless body area network (WBAN). To demonstrate the concept, a prototype PCS is presented that is implemented on a netbook computer with a dedicated audio interface in combination with a mobile phone. The prototype can be used for field-testing possible applications and to reveal possibilities and limitations of the concept of integrating hearing support in consumer audio communication devices. It is shown that the prototype PCS can integrate hearing aid functionality, telephony, public announcement systems, and home entertainment. An exemplary binaural speech enhancement scheme that represents a large class of possible PHS processing schemes is shown to be compatible with the general concept. However, an analysis of hardware and software architectures shows that the implementation of a PCS on future advanced cell phone-like devices is challenging. Because of limitations in processing power, recoding of prototype implementations into fixed point arithmetic will be required and WBAN performance is still a limiting factor in terms of data rate and delay.


2
EURASIP Journal on Advances in Signal Processing yet, and the final user benefit remains to be investigated. Dedicated prototype systems as investigated in this study might facilitate this type of research.
Contrary to a hearing-impaired with moderate or strong hearing loss, a person with mild hearing loss, or, more generally, any person with light-to-moderate problems in hearing under adverse circumstances, will not wear a hearing aid nor other hearing support system. Hearing support systems which are add-ons to existing communication devices might be beneficial for those users and their acceptance is expected to be higher than that of conventional hearing aids.
Another factor that influences and might facilitate the further development of hearing support systems is the availability of standard hardware and open software for mobile devices, for example, the iPhone [8] or the GooglePhone [9]. These devices can act as a central processor for hearing aids with access to binaural audio information and the advantage of increased processing performance [10]. Based on such scalable systems, the integration of hearing support systems for slight-to-moderate hearing losses with communication applications seems to be feasible in principle, but is yet to be assessed in more detail. One step toward that direction is lowdelay real-time signal processing systems based on standard hard-and software, such as the Master Hearing Aid (MHA) [11], a development framework for hearing aid algorithms. Another hardware-related factor is the development of Wireless Body Area Networks (WBANs) that can be seen as an enabling technology for mobile health care [12] and that could mediate the communication between a central processor and audio headsets attached to the ear like hearing aids.
In summary, recent developments open up the possibility of merging the functionality of traditional hearing aids and other hearing support systems for slight-to-moderate hearing losses on scalable hardware. This combination will be defined as a Personal Hearing System (PHS). Furthermore, the integration of this PHS with general and new communication applications of mobile phones and PDAs to define a Personal Communication System (PCS) may lead to new applications and improved hearing support. User inquiries regarding the acceptance of such a PCS have been carried out within the EU project HearCom [13], and its general acceptance was demonstrated, provided that the device is not larger than a mobile phone and includes its functionality. Some specific solutions to this already exist, but audio applications with scalable listening support for different types of hearing losses and having a connection to personal communication and multimedia devices are not yet available. The aim of this study is, therefore, to establish a basis for further research and development along these lines. In Section 2, the proposed architecture of a PCS is outlined. Section 3 describes the implementation of a prototype PCS which runs on a netbook computer and hosts four representative signal enhancement algorithms. A first evaluation of the hardware requirements (e.g., processing power, wireless link requirements), of the software requirements (scalable signal processing), and of the expected benefit for the end users is performed using this PCS prototype.

PCS Architecture
The PCS is a hand-held concentrator of information to facilitate personal communication. Figure 1 shows a block diagram of the projected PCS and its applications. The PCS is a development based on new advanced mobile telephones and Personal Digital Assistants (PDAs). The reason for selecting a mobile phone as a PCS platform is the availability of audio and data networking channels, like GSM, UMTS, BlueTooth, and WiFi. A global positioning system-if available-can be utilized by public announcement services.
Audio is played to the user via a pair of audio headsets. These audio headsets are housing loudspeakers/receivers for audio playback. Each audio headset also has two or three microphones, which can be configured to form a directional microphone for picking up environmental sounds, and the own voice of the user for phone application. As an option, the audio headsets provide audio processing capabilities similar to hearing aids.
A short-range wireless link (Wireless Body Area Network, WBAN) provides the connection between the PCS and the audio headsets, and optionally between the two audio headsets at the left and right ears. Mid-range and wide-range links are used to establish connections to telecommunication network providers and to local information services. All links are part of the wireless Personal Communication Link (PCL), which supplies information to the PCS and between the PCS and the audio headsets, as a successor for the inductive link (telecoil) of current hearing aids.
A key application on the PCS is the PHS: the audio communication channels of the PCS, for example, telephony, public announcement, and home entertainment, are processed in the PHS with personalized signal enhancement schemes and played back through the audio headsets. In addition to the PCS audio communication channels, the PHS can process environmental sounds picked up by the headset microphones near the user's ears. Processing methods may differ depending on the input, that is, acoustic input or input through the PCS communication channels. The functionality of the PHS covers that of a conventional hearing aid, and adds some additional features. (i) Increased connectivity: the PCS provides services, which can connect external sources with the PHS. (ii) Advanced audio signal processing schemes: the computational power and battery size of the central processing device allows for algorithms which otherwise would not run on conventional hearing aids. (iii) Potential of production cost reduction: usage of standard hardware may reduce production, marketing, distribution, and service costs if consumer headsets with slight modifications, for example, addition of microphones for processing of environmental sounds can be used (which is limited to subjects with mild to moderate hearing loss).

Distributed Processing.
For processing the PCS audio communication channels, an unidirectional link from the central processor to the headsets is sufficient and the link delay is not critical as long as it remains below 50-100 ms. Processing environmental sounds in the central processor, however, requires a bidirectional link which needs further  consideration. In general, all processing blocks can be run either on the audio headsets or on the central processor. The optimal choice for each processing block depends on several issues: (i) The computational performance and battery capacity of the audio headsets is typically low and does not allow complex algorithms. (ii) The central processor or the PCL might not be available continuously because of wireless link breakdowns. Therefore, at least basic processing like amplification for hearing loss correction is required to run on the audio headsets. (iii) Depending on the properties of the PCL, the delay might exceed the tolerable delay for processing of environmental sounds [14], and will constraint the algorithms on the central processor. Link delays smaller than 10 ms would allow routing the signal through the central processor. In typical hearing aid applications, signal enhancement schemes precede the processing blocks for hearing loss correction (e.g., amplification and compression). To avoid the transmission of several signal streams, only one set of successive processing blocks can be run on the central processor. As to whether emerging WBAN technology might be powerful enough to achieve the delay limit seems unclear yet. If the total link delay is longer than about 10 ms, the signal path needs to remain completely on the audio headsets. Then, processing on the central processor is restricted to signal analysis schemes that control processing parameters of the signal path, for example, classification of the acoustical environment, direction of arrival estimation, and parameter extraction for blind source separation. In general, it seems feasible that these complex signal analysis schemes and upcoming complex processing performance demanding algorithms for Auditory Scene Analysis [15] might not necessarily be part of the signal path. The projected architecture might, therefore, be suited for these algorithms, which could benefit from the high signal processing and battery power of the central processor. Other requirements for the link are bandwidth and low power consumption: to allow for multichannel audio processing, several (typically two or three) microphone signals from each ear are required, asking for sufficient link bandwidth. Additionally, if signals are transmitted in compressed form, the link signal encoder should not modify the signal to avoid artifacts and performance decreases in multichannel processing. To ensure long battery life, the link should use low power. To reduce the link power consumption, the PHS could provide only advanced processing on demand. Switching on advanced processing and the link might be either controlled manually or by an automatic audio analysis in the headsets.
The architecture of the PHS with a central processor gives the ability to process binaural information in the central processor and unilateral information either in the central processor or in the audio headsets. Considering typical processing schemes in hearing aids, unilateral processing comprises dynamic compression, single channel noise reduction, and feedback cancellation. Typical applications of the central processor are binaural and multimicrophone methods, for example, binaural ambient noise reduction, beamformer, and blind source separation [3]. If the link delay is not sufficiently small to route the signal path through the central processor, binaural processing can still be achieved assuming a signal analysis algorithm running on the central processor processes signals from left and right side and controls the signal path on both sides.

Implementation of a Prototype System
To assess the PCS architecture and applications experimentally, a prototype PCS has been implemented on a small notebook computer "netbook" in combination with a Smartphone. To demonstrate PHS applications, several signal enhancement algorithms have been realized on the PCS prototype using the MHA algorithm development environment, see Section 3.2.1. A dedicated audio interface that was developed within the EU HearCom project to connect audio headsets to the netbook is described in Section 3.2.2. A phone service as a prototype application of the PCS, implemented on a Smartphone, is described in Section 3.3.2. One signal enhancement algorithm (coherence-based dereverberation [7]) was taken as an example and has been tested on its conformity with the concept of the PHS. Figure 2 for a schematic signal flow of the prototype system. The PHS is implemented using a separate notebook computer. Notebook computers deliver sufficient performance for audio signal processing. Selecting signal processing algorithms carefully and using performance optimization techniques allows for stripping down the PC platform. However, a floating point processor is required for prototype algorithms and prevents using fix point processor-based PDAs or Smartphones. Using floating point algorithms enables fast prototyping and very early field testing. In a later step, it is necessary to recode positively evaluated algorithms to a fixed point representation and install these on PDAs or Smartphones. PCS services are implemented on a Smartphone with networking capabilities. The PCS-PHS link is realized as a WiFi network connection. The audio headsets are hearing aid shells with microphones and receiver, without signal processing capabilities. The audio headsets are connected to the PHS via cables and a dedicated audio interface. The audio headset signal processing capabilities are simulated on the central processor. Figure 3 shows a netbook-based PHS prototype.

Hardware Components.
In the following sections, the hardware components of the prototype implementation are described.

Netbook: Asus
Eee PC. For the prototype system, a miniature notebook has been used as a hardware accelerated floating point processor for the PHS: the Asus Eee PC is a small and lightweight notebook PC, its size is about 15 * 22 cm, weighting 990 grams. It provides an Intel Celeron processor M, running at a clock rate of 630 MHz. To achieve low delay signal processing in a standard operating system environment, a Linux operating system (UbuntuStudio 8.04) with a low-delay real-time patched kernel (2.6.24-22-rt) has been installed. For comparison, the system was also installed on an Acer Aspire netbook PC and a standard desktop PC.

Dedicated Audio Interface.
A detailed market survey showed that commercially available audio interfaces cannot satisfy all requirements for the mobile PHS prototype. Highquality devices as used in recording studios offer the required signal quality and low latency but are not portable because of size, weight, and external power supply. Portable consumer products do not offer quality, low latency, and required a number of capture and playback channels. Therefore, a dedicated USB audio interface has been developed which fulfills the requirements of the PHS prototype. The audio interface has been developed in two variants: a device with four inputs and two outputs to drive two audio headsets with two microphones in each headset (USBSC4/2), and a device with six inputs and two outputs, for two audio headsets with three microphones each (USBSC6/2). The basis for both devices is a printed circuit board (PCB). The USBSC4/2 contains one PCB as shown as PCB1 in Figure 4. Assembled are two stereo ADs (four channels) and one stereo DA (two channels). A microcontroller (μC) implements the USB2.0 interface to PC hardware. A complex programmable logic   firmware) enables the configuration of other devices in the shortest time, as, for example, a device with four inputs and outputs, or eight inputs and no outputs. The hardware is also applicable outside the scope of hearing aid research: with minor modifications, it can be used as a mobile recording device or as consumer sound card for multimedia PCs. For future usage of the developed hardware, device variations in number and type of channels depending on user requirements are quickly retrievable. The architecture is extendable by a hardware signal processing unit for userdefined audio preprocessing by exchanging the CPLD with more complex components like field programmable logic devices (FPGA). This extension would decrease the CPU load of the host PC, or would allow for a higher computational complexity of the algorithm. It has to be stated though that the implementation of algorithms in FPGAs using a fixed point hardware description language (HDL) like VHDL or Verilog is even more elaborate than transferring floating point SW to fixed point. Thus, this proceeding is only adequate for well evaluated and often used algorithms like, for example, the FFT due to high nonrecurring engineering costs.
The developed audio interface is a generic USB2.0 audio device that does not require dedicated software drivers for PCs/Notebooks running under the Linux operating system. The device utilizes the USB2.0 isochronous data transfer connection for low latency, and, therefore, does not work with USB1.
The audio interface is equipped with connectors to directly connect two hearing aid shells housing a receiver and up to three microphones. The device provides a microphone power supply. The USB audio interface is powered via the USB connection. RC filters and ferrite beads are used to suppress noise introduced by the USB power supply. One RC filter is placed directly at the supply input. Furthermore, at each AD-and DA-converter, one filter is placed close to the analog and the digital supply, respectively. Additionally, noise is suppressed by the use of ferrite beads in each supply line of each converter.

Software Components.
In the following sections, the major software components used in the PCS prototype implementation are described.

PHS Algorithms.
In the PHS prototype four representative signal enhancement schemes have been implemented: single-channel noise suppression based on perceptually optimized spectral subtraction (SC1), Wiener-filterbased single-channel noise suppression (SC2), spatially preprocessed speech-distortion-weighted multichannel Wiener filtering (MWF), and binaural coherence dereverberation filter (COH) [7]. Individually fitted dynamic compression and frequency-dependent amplification was placed after the signal enhancement algorithm to compensate for the user's hearing loss. Hearing loss compensation without any specific signal enhancement algorithm is labelled REF. The MHA was used as a basis of the implementation [11].
The prototype algorithms are processed at a sampling rate of 16 kHz in blocks of 32 samples, that is, 2 ms. Audio samples are processed with 32 Bit floating point values, that is, four bytes per sample.
As an example, we look at the coherence-based dereverberation filter in more detail: the microphone signal is transformed into the frequency domain by a short-time fast Fourier transform (FFT) with overlapping windows [16]. At both ears, the algorithm splits the microphone signals X l and X r into nine overlapping frequency bands. In each frequency band k, the average phase ϕ across FFT bins ν belonging to the frequency band k is calculated, ϕ(k) = ∠ ν W(k, ν)X(ν). The weighting function W(k, ν) defines the filter shape of the frequency band, see [11] for details. Also, ϕ is implicitly averaged across time over the length of one analysis window. Comparing the phase with the phase of the contralateral side results in the interaural phase difference (IPD) within a frequency band. The phase difference ϕ l − ϕ r is represented as a complex number on the unit circle, z = e j(ϕl−ϕr ) . The estimated coherence is the gliding vector strength c of z, c = | z τ |, with the averaging time constant τ. The estimated coherence is directly transformed into a gain by applying an exponent α, G = c α . This gain is applied to the corresponding frequency band prior to its transformation back to the time domain. A detailed description of the algorithm and its relation to a crosscorrelation based dereverberation algorithm can be found in [17].

PCS-PHS Link for Testing Multimedia Applications.
To provide ways to connect PCS audio communication streams to the PHS, a specific network audio interface has been implemented in the PHS. Together with a sender application, this interface forms the PCS-PHS link, which is using the Internet Protocol Suite (TCP/IP). The link can be established on demand, and contains a protocol to select appropriate mixing strategies for different signal sources: the level of the source signal can be matched with the environmental sound level, and environmental sounds can be suppressed for better speech intelligibility or alarm signal recognition. The mixing configuration is followed by the audio stream.
Whenever a phone connection is established, the sender application in the PCS is connecting to the PCS-PHS link and is recoding the phone's receiver output for transmission to the PHS. To avoid drop-outs in the audio stream, the signal from the phone has to be buffered, introducing a delay between input and output. To reduce the delay caused by the WiFi connection, the packet size was reduced to a minimum. The total delay varies between 360 and 500 ms. The long delay is specific to the prototype implementation with a WiFi link; the final application will not include the WiFi link between PCS phone service and PHS, since both services are then hosted on the same machine. Via a cablebound network, connection delays in the order of 5ms can be reached, for example, by using the "NetJack" system [18] or with "soundjack" [19]. An alternative approach is an analog connection. However, this would not allow for sending control parameters to the PHS.

Computational Complexity and Power Consumption.
The computational complexity of the PHS prototype system is estimated by measuring the CPU time needed to process one block of audio data, divided by the duration of one block. For real-time systems, this relative CPU time needs to be below one. For most operating systems, the maximum relative CPU time depends also on the maximum system latency and the absolute block length. A detailed discussion of relative CPU time and real-time performance can be found in [11]. The relative CPU time of the PHS running the four respective signal enhancement algorithms is shown in Table 1.
For a portable PHS, a long battery runtime is desirable. The battery runtime of the PHS prototype has been measured by continuously measuring the battery voltage while running the PHS with the respective signal enhancement algorithms. The battery was fully charged before each measurement. The time until automatic power-down is given in Table 1. During the test, the netbook lid was closed and the display illumination was turned off. The measurement was performed once. To check that the battery age did not significantly influence the results, the first measurement was repeated. No significant differences have been observed. However, slight variations might have been caused by additional CPU load caused by background processes of the operating system, and by differences between the access to other hardware, for example, memory. The correlation between total CPU load and the battery time is very high for the Asus Eee PC and low for the Acer Aspire one. The low correlation between CPU usage and battery runtime for the Acer Aspire one might be an indication for a less efficient hardware.

Benefit for the End User.
The algorithm performance in terms of speech recognition thresholds (SRTs) and preference rating has been assessed in the HearCom project in a large multicenter study [7]. As an example, speech recognition threshold improvement data from [7] is given in Table 2. Preference "COH'' is better Figure 5: Preference histogram. COH is preferred against the reference condition "REF" by 80.6% of the hearing impaired and by 61.1% of the normal hearing subjects. The categories 1-5 are "very slightly better," "slightly better," "better," "much better," and "very much better." The standard deviation of the results across four different test sites is marginal, which proves the reliability of the PHS prototype as a research and field testing hearing system. While the speech intelligibility could not be improved by the algorithm COH, it was preferred by most subjects against "REF" processing (i.e., only hearing loss correction), see Figure 5. Hearing-impaired subjects show a clearer preference for COH than normal hearing subjects do. The listening effort can be reduced by COH if the SNR is near 0 dB [20]. Even if the SRT cannot be improved by the algorithm, the reduction of listening effort is a major benefit for the user. Furthermore, a combination with the MWF algorithm is possible and indicated, since both methods exploit different signal properties (directional versus coherence properties). An improvement of the beamformer performance is likely if the coherence filter is preceding the beamformer [21].

Requirements towards the PCL.
The requirements toward the wireless link between headsets and central processor varies with the algorithms. Estimated data rates for 4-byte sample formats without any further data compression are given in this section as a worst-case scenario. The link bandwidth required to transmit all six microphone channels is 768 bytes per block in the direction from one headset to the central processor and 256 bytes per block in the other direction (two receiver signals). With two headsets and 500 blocks per second this leads to a required (uncompressed) bandwidth of 3 MBit/s from headsets to the central processor and 1 MBit/s back to the headsets. The requirements for the coherence filter "COH" toward the link bandwidth in three scenarios are presented. The trivial scenario is the condition where the algorithm is running on the central processor, where the full audio signal of both  sides is required, that is, 1 MBit/s in each direction. Lower bandwidth is required if only signal analysis is performed in the central processor: the phase information in nine frequency channels of each side and for each signal block is required, leading to a headset-to-processor bandwidth requirement of 281.25 kBit/s. For the other direction, nine gains are transmitted, with identical gains for both sides. This results in a bandwidth requirement of 140.625 kBit/s. The third scenario is a situation where signal analysis and filtering are processed in the audio headsets, and the link is only used for data exchange between the audio headsets. Then only the phase information is exchanged, that is, 140.625 kBit/s are required in each direction. These bandwidth requirements do not include data compression. With special signal coding strategies, the bandwidth requirements can be further reduced. The bandwidth of current hearing aid wireless systems is in the range of 0.1-100 kBit/s, with a range of approximately one meter. The power consumption is below 2 mW. BlueTooth technology provides data rates between 10 kBit/s and 1 MBit/s, at a power consumption of 25-150 mW, and a range between 3 and 10 m. Low-delay codecs can achieve transmission delays below 1 ms at a bandwidth of 32 kBit/s [22]. For signal routing from the headsets via the central processor back to the headsets, a maximum delay of Table 3: Evaluation results of the dedicated audio interface. As a reference device for the dynamic range measurements, an RME ADI8 Proconverter has been used. The minimal delay depends not only on the sampling rate (and thus on the length of the antialiasing filters) but also on the achievable minimal block lengths. approximately 10 ms is acceptable. For larger delays, a signal analysis on the central processor is possible. However, when the sequence of filter coefficients is delayed relative to the signal to which it is applied, signal distortion will arise.

EURASIP Journal on Advances in Signal Processing
Informal listening tests revealed that a link delay of 25 ms is acceptable for the COH algorithm. Because this algorithm represents the class of speech envelope filters, this margin might apply for the more general case, too. The total delay of the prototype system with the "COH" algorithm is 11.4 ms using the dedicated USB audio interface, and 10.1 ms using an RME HDSP9632 audio interface with RME ADI8 Proconverters.

Technical Performance of the Dedicated Audio Interface.
The technical performance of the dedicated audio interface is given in Table 3. During the evaluation of the dedicated audio interface, the following factors on the audio quality of the device have been found. (i) A notebook should be disconnected from power supply and used in battery mode, to avoid a 50 Hz distortion caused by the net power supply.
(ii) Other USB devices should not be connected to the same USB controller/hub since the data transmissions of these devices could interfere with the USB power supply (crosstalk effects between data and power wires) and thereby degenerate signal quality. (iii) Front PC-USB-ports are often attached to the CPU's main-board by long ribbon cables.
Running alongside a gigahertz processor, this constellation introduces a vast amount of interference and noise.

Discussion
New technological developments make the development of a communication and hearing device with advanced and personalized signal processing of audio communication channels feasible. User inquiries underline that such a development would be accepted by the end users. Such a device has the potential of being accepted as an assistive listening device and "beginner" hearing aid. However, the introduction depends on the availability of audio headsets with microphones and a bidirectional or only unidirectional low-power and low-delay link. If the link to the audio headsets is not low-delay or unidirectional, then environmental sounds cannot be processed on the central processor, and the benefit of the PCS would be reduced to personalized postprocessing of audio streams from telephony, multimedia applications, and public announcement systems. This processing is usually not as computational demanding as algorithms for environmental audio processing, for example, auditory scene analysis. As to whether the central processor could be used for processing computational demanding algorithms depends on whether the data to be exchanged between central processor and headsets can be restricted to preprocessed signal parameters and time-dependent gain values. The perspective of transmitting the full audio signal at very low delays seems unclear to date. The implementation of a prototype system revealed barriers and solutions in the development of a PCS as a concentrator of communication channels. The advantage of using high-level programming languages in algorithm development is partly reversed by the need of a floating point processor. Current Smartphones and PDAs do have only fixed point processing. The continuous convergence of miniature notebook PCS and mobile phones might lead to a new generation of mobile phones providing floating point processing, but this is unclear at present. The solution for the prototype was to chose a separate small notebook PCbased hardware for the PHS with a network connection to the PCS. The processor performance of such a netbook is sufficient to host most recent advanced signal enhancement algorithms. The battery of a netbook computer provides a runtime which is sufficient for field testing. However, final realizations of the PHS for everyday use must provide significantly longer runtime before recharging. The audio quality of the dedicated audio interface is sufficiently high for field testing.

Conclusions
An architecture of a personal communication system with a central processor and wireless audio headsets seems feasible with the expected WBAN developments. However, algorithms have to be tailored to match WBAN limitations, and the audio headsets need microphones and own processing capabilities. The presented binaural noise reduction scheme "COH" is one example algorithm that might match the constraints.
Usage of scalable hardware and software is feasible, but direct usage of software from the prototype system for products cannot be expected: due to the missing availability of floating point processing capabilities in mobile hardware, recoding floating point implementations to a fixed point representation is necessary. This is not expected to change in the near future.
The prototype system is helpful for algorithm evaluation and for testing possible PCS applications, but the gap towards real systems is still large.
Future work should investigate the concept further by implementing and field-testing further algorithms for hearing support and communication options using the prototype system.