Low-frequency ambient noise generator with application to automatic speaker classification

A novel low-frequency 1/f ambient noise generator using fractional statistics is proposed in this article. The noise samples are obtained by transformation functions performed on pseudo-random uniform sequences. The 1/f spectrum representation achieved for the generated noise samples, shows that this proposition is very promising for the investigation of the low-frequency noise effect in signal processing techniques, devices and systems. It is also demonstrated that it can be useful to serve as background ambient noise in speaker classification applications.


Introduction
In the last decades, the presence of low-frequency or 1/f noise has been widely observed in such a variety of systems [1,2]. In particular, 1/f -spectra acoustic noise has been measured in ocean noise [3], music [4] and speech [5]. Noisy environments can severely degrade the performance of speech and speaker classification applications [6][7][8]. These background noise sources can have different temporal and spectral statistics. Therefore, 1/f acoustic noise shall be considered to achieve robust signal processing techniques.
Noises are random processes described by the shape of its power spectral density (PSD). The PSD of noises [1,9] is defined by S(f ) ≈ 1 f β with 0 ≤ β ≤ 2. Generally, the PSD shape family is achieved by filtering Gaussian white noise (fgwn) sequences using digital finite impulse response (DFIR) filters and signal processing techniques [10][11][12]. However, the wide-sense stationary can only be measured for very long sample sequences. Mandelbrot and Van Ness [9] showed that the 1/f noise statistics can be accurately represented by the fractional Brownian motion (fBm). fBm is defined as a non-stationary stochastic process. Nevertheless, the shape of the PSD and the β exponent can be quasi-stationary if the observed time is short compared to the process life time [1,13]. And thus, *Correspondence: coelho@ime.eb.br. Electrical Engineering Department, Acoustic Signal Processing Laboratory of the Military Institute of Engineering (IME), Rio de Janeiro, RJ 22290-270, Brazil it enables the application of the estimation theory for 1/f processes [14,15].
1/f fractional noise has S(f ) ∝ f 1−2H , where 1/2 < H < 1 is the Hurst parameter [16]. The H parameter is described by the slow-decaying rate of the autocorrelation function (ACF) of the noise samples. It represents the low-frequency or scaling invariance degree of the fractional noises and it is frequently close to 1.
This article proposes the generation of 1/f ambient noise samples based on the fBm statistics. In the present approach, the 1/f spectral behavior is obtained from the ACFs of the noise samples generated by the fBm process. The 1/f ambient noise sample generation is based on transformation functions performed on uniform random sequences. These functions are defined by the successive random addition algorithm using the midpoint displacement (SRMD) technique [17]. In a previous study, these transformation functions were successfully evaluated for a low-frequency optical noise samples generation [18].
The solution presented for the SRMD algorithm to generate the 1/f acoustic noise samples, is also implemented in a high-speed field-programmable gate array (FPGA) Development Kit. Each noise output value is then pulse coded modulation (PCM) encoded/quantized and sampled at 8 KHz to produce the ambient noise levels.
For the experiments, it is considered the real or natural 1/f Airport [19] and Airplane [20] ambient noises and also an artificial Pink [20] noise. The validation results http://asp.eurasipjournals.com/content/2012/1/175 include the estimation of the main parameters or statistics (β exponent, H, mean (μ), variance (σ 2 ) and Kurtosis (K)), the PSD and the heavy-tail distribution (HTD) curves and the Bhattacharyya distance (B d ). These results are obtained from the real and the generated noise samples. For the experiments, 1/f sample sequences are also generated by filtering a Gaussian white noise using the Al-Alaoui transfer function [21]. Furthermore, the performance of the proposed 1/f acoustic ambient noise generation is evaluated for a speaker identification task considering different signal to noise ratio (SNR) values.
The rest of the article is organized as follows. Section "1/f fractional Brownian noise: an overview" gives an overview of the 1/f fractional Brownian noise and describes the SRMD technique. Section "Implementation setup" introduces the implementation setup of the proposed 1/f ambient noise generator. The main validation results are reported and discussed in Section "Validation results and discussion". The speaker classification task and the related results are shown in Section "Speaker classification experiments". Finally, Section "Conclusion" presents the main conclusions of this work.

1/f fractional Brownian noise: an overview
For any instant t > 0, X H (t) is a fractional random function with Gaussian independent increments [9]. The fBm is known as the unique Gaussian H-self-similar with self-similarity parameter and stationary increments (sssi) random process. The variance of the independent increments is proportional to its time interval accordingly to for all instants t 1 and t 2 and, 3. X H (t) presents continuous sample paths.
In other words, its statistical characteristics hold for any time scale. Thus, for any τ and r > 0, where d ≈ means similar in distribution and r is the random process scaling factor. Note that X H (t) is a Gaussian process completely specified by its mean, variance, H parameter. The ACF of 1/f X H , i.e., 1/2 < H < 1 is for k ≥ 0 and ρ X (k) = ρ X (−k) for k < 0. In the present proposition, the spectral density is derived from the ACF of the 1/f fBm noise samples defined in (3). This is ensured by the PSD and ACF exponents that are both related to the H parameter.

SRMD
Considering a time index t defined at the interval [ 0, 1], the SRMD algorithm establishes that setting X(0) = 0 and X(1) as a Gaussian random variable (RV) with zero-mean and variance σ 2 then, and To achieve this property a random offset displacement (D i ) with zero-mean and variance For example, the X(1/2) value is obtained by the interpolation of X(0) and X(1) with variance δ 2 /2 2H+1 . Several iterations are then proceeded to compose a 1/f fBm noise sample sequence. In order to find stationary increments, after the midpoints interpolation, a D i of a certain variance, ∝ (r n ) 2H (r is the scaling factor), is applied to all points (time increments) and not just the midpoints. The maximum number of iterations is defined by N = 2 maxlevel where maxlevel is generally applied in the interval [0,16] [9]. The other SRMD inputs are the standard-deviation and the H parameter.   Besides the real ambient noises and the artificial Pink noise, samples obtained by filtering a Gaussian white noise are considered for the validation of the proposed method. The applied method uses the Al-Alaoui digital integrator transfer function [21] with β/2 as the fractional order exponent to compose the transfer function H(z),

Implementation setup
where T is the sampling period. The filter coefficients are obtained by the convolution h(k) = a(k) * b(k), where a(k) and b(k) are the first N/2 terms obtained by expanding, respectively, the numerator and denominator of (6) in power series [12].

Validation results and discussion β and H estimation results
The β exponent is estimated from the linear regression applied to the PSD function curves. Table 1 shows the β exponent, the mean square error (MSE) of the β estimation, and the H results obtained from the real and artificial noises, and from the noise samples generated by the proposed and the fgwn methods. The results are presented for 320,000 samples since this is the size of the real ambient noise sequences. For the H estimation it is used the wavelet-based method [23]

Kurtosis, mean, variance statistics
Kurtosis measures the skewness of a sample from a Gaussian distribution. The K, mean and variance estimation results of the noise sequences are presented in Table 2. As expected, the K values are close to 3. Thus confirming that the noise samples are Gaussian distributed.

Bhattacharyya distance
The Bhattacharyya distance measures the separability between two sample sequences with Gaussian distribution and is defined by where μ i is the mean vector and C i is the covariance matrix of class i = 1, 2. The B d are measured between the generated sequences and the corresponding Airplane, Airport and Pink noises. It can be seen from Table 3 the Pink noise samples distribution produced by both methods, is very similar to the distribution of the artificial Pink noise. However, the samples distribution obtained from the proposed method are much similar to the distribution of the real ambient noises.

PSD results
The power spectral densities obtained from the real and the generated 1/f acoustic noise samples are presented in Figure 2. The PSDs were measured using a high-performance 300 MHz bandwidth spectrum analyzer. These results demonstrate the slow-decaying (3 dB/octave) behavior of the PSD shape of the 1/f noises. It can also be seen that the proposed method better represents the PSD behavior of the real acoustic noises. they exhibit very close tails. This also confirms the H results (see Table 1) obtained from the proposed solution.

Speaker classification experiments
In a speaker identification process, a speech utterance has to be identified as to which of the registered speakers it belongs. For the experiments, the speech utterances were corrupted with the real and generated noise samples. For the speaker identification were considered the mel-cepstral coefficients (MFCC) and the Gaussian mixed model (GMM) [24] which are respectively, the most commonly used speech features and classifier employed in speaker recognition tasks. A mixture of Gaussian probability densities is a weighted sum of M densities, and is given by with mean vector μ i and covariance matrix K i , where T denotes the transpose operation and |.| is the determinant. The GMM (λ) is parametrized by the mean vectors, covariance matrices, and mixture weights. The model parameters are estimated for a set of training data as the ones that maximize the likelihood of the GMM. The expectation-maximization (EM) algorithm [24] is used for the model parameters estimates. Considering a sequence of T independent training vectors X = { x 1 , . . . , x T }, the normalized log-likelihood of the GMM is The decision rule of the speaker identification system chooses the speaker model for which this value is maximum.

Speaker identification accuracy results
The speaker identification task evaluation is performed on the KING speech corpus. This is composed of conversational sessions of speech recorded by 49 male speakers.
For the experiments, five sessions are used resulting in 100 s average of speech per speaker, after silence removal. Three of these sessions (60s) are applied for the speaker model training. The remaining two sessions (40s), are used to evaluate the identification accuracies.
The speaker classification results are presented for test duration of 5 and 1 s. The real and generated 1/f noise samples are added to the speech utterances serving as background ambient noise. For the investigation, it is also considered the SNR 0 dB, 5 dB, 10 dB, 15 dB, and 20 dB to evaluate the system under different noisy conditions. For the identification task it was considered speech feature vectors with 25 MFCCs, extracted from 20 ms speech frames, and M = 32 GMM components. The speaker identification accuracies are shown in Figure 4. The results show that the generated 1/f noise produced similar effect when compared to the real ambient noise. This means that it could be applied as artificial background noise.

Conclusion
A new low-frequency 1/f ambient noise generator using fractional statistics is described in this article. The PSD shape of the 1/f generated noise samples is achieved from the ACFs of the noise samples generated with the fBm process. The implementation of the 1/f ambient noise generator enables the validation of the pattern and the PSD representation. It is shown that this proposition is very promising for the investigation of this noise effect in the signal processing techniques. Furthermore, the speaker identification experiments demonstrate that the generated ambient noise samples can be useful to serve as background or additive noise.