Low-frequency ambient noise generator with application to automatic speaker classification
- Ricardo Santana^{1} and
- Rosângela Coelho^{1}Email author
https://doi.org/10.1186/1687-6180-2012-175
© Santana and Coelho; licensee Springer. 2012
Received: 20 May 2011
Accepted: 22 June 2012
Published: 17 August 2012
Abstract
A novel low-frequency 1/f ambient noise generator using fractional statistics is proposed in this article. The noise samples are obtained by transformation functions performed on pseudo-random uniform sequences. The 1/f spectrum representation achieved for the generated noise samples, shows that this proposition is very promising for the investigation of the low-frequency noise effect in signal processing techniques, devices and systems. It is also demonstrated that it can be useful to serve as background ambient noise in speaker classification applications.
Keywords
Introduction
In the last decades, the presence of low-frequency or 1/f noise has been widely observed in such a variety of systems [1, 2]. In particular, 1/f -spectra acoustic noise has been measured in ocean noise [3], music [4] and speech [5]. Noisy environments can severely degrade the performance of speech and speaker classification applications [6–8]. These background noise sources can have different temporal and spectral statistics. Therefore, 1/f acoustic noise shall be considered to achieve robust signal processing techniques.
Noises are random processes described by the shape of its power spectral density (PSD). The PSD of noises [1, 9] is defined by $S\left(f\right)\approx \frac{1}{{f}^{\beta}}$ with 0 ≤ β ≤ 2. Generally, the PSD shape family is achieved by filtering Gaussian white noise (fgwn) sequences using digital finite impulse response (DFIR) filters and signal processing techniques [10–12]. However, the wide-sense stationary can only be measured for very long sample sequences. Mandelbrot and Van Ness [9] showed that the 1/f noise statistics can be accurately represented by the fractional Brownian motion (fBm). fBm is defined as a non-stationary stochastic process. Nevertheless, the shape of the PSD and the β exponent can be quasi-stationary if the observed time is short compared to the process life time [1, 13]. And thus, it enables the application of the estimation theory for 1/f processes [14, 15].
1/f fractional noise has $S\left(f\right)\propto {f}^{1-2H}$, where 1/2 < H < 1 is the Hurst parameter [16]. The H parameter is described by the slow-decaying rate of the auto-correlation function (ACF) of the noise samples. It represents the low-frequency or scaling invariance degree of the fractional noises and it is frequently close to 1.
This article proposes the generation of 1/f ambient noise samples based on the fBm statistics. In the present approach, the 1/f spectral behavior is obtained from the ACFs of the noise samples generated by the fBm process. The 1/f ambient noise sample generation is based on transformation functions performed on uniform random sequences. These functions are defined by the successive random addition algorithm using the midpoint displacement (SRMD) technique [17]. In a previous study, these transformation functions were successfully evaluated for a low-frequency optical noise samples generation [18].
The solution presented for the SRMD algorithm to generate the 1/f acoustic noise samples, is also implemented in a high-speed field-programmable gate array (FPGA) Development Kit. Each noise output value is then pulse coded modulation (PCM) encoded/quantized and sampled at 8 KHz to produce the ambient noise levels. For the experiments, it is considered the real or natural 1/f Airport [19] and Airplane [20] ambient noises and also an artificial Pink [20] noise. The validation results include the estimation of the main parameters or statistics (β exponent, H, mean (μ), variance (σ^{2}) and Kurtosis (K)), the PSD and the heavy-tail distribution (HTD) curves and the Bhattacharyya distance (B_{ d }). These results are obtained from the real and the generated noise samples. For the experiments, 1/f sample sequences are also generated by filtering a Gaussian white noise using the Al-Alaoui transfer function [21]. Furthermore, the performance of the proposed 1/f acoustic ambient noise generation is evaluated for a speaker identification task considering different signal to noise ratio (SNR) values.
The rest of the article is organized as follows. Section “1/f fractional Brownian noise: an overview” gives an overview of the 1/f fractional Brownian noise and describes the SRMD technique. Section “Implementation setup” introduces the implementation setup of the proposed 1/f ambient noise generator. The main validation results are reported and discussed in Section “Validation results and discussion”. The speaker classification task and the related results are shown in Section “Speaker classification experiments”. Finally, Section “Conclusion” presents the main conclusions of this work.
1/f fractional Brownian noise: an overview
- 1.
X _{ H }(t) has stationary increments.
- 2.
X _{ H }(0)=0 and E[X _{ H }(t)]=0 for any instant t.
- 3.
X _{ H }(t) presents continuous sample paths.
for k ≥ 0 and ρ_{ X }(k) = ρ_{ X }(− k) for k < 0. In the present proposition, the spectral density is derived from the ACF of the 1/f fBm noise samples defined in (3). This is ensured by the PSD and ACF exponents that are both related to the H parameter.
SRMD
for 0 ≤ t_{1} ≤ t _{2} ≤1. To achieve this property a random offset displacement (D_{ i }) with zero-mean and variance ${\delta}_{i}^{2}=1/{2}^{-(i+1)}{\sigma}^{2}$, must be added to the noise sample. For example, the X(1/2) value is obtained by the interpolation of X(0) and X(1) with variance δ^{2} /2^{2H + 1}. Several iterations are then proceeded to compose a 1/f fBm noise sample sequence. In order to find stationary increments, after the midpoints interpolation, a D_{ i } of a certain variance, $\propto {\left({r}^{n}\right)}^{2H}$ (r is the scaling factor), is applied to all points (time increments) and not just the midpoints. The maximum number of iterations is defined by N=2^{maxlevel}where maxlevel is generally applied in the interval [0,16] [9]. The other SRMD inputs are the standard-deviation and the H parameter.
Implementation setup
The X[i] has i=2^{ maxlevel } increments or noise levels. Uniform random number block was coded to produce 32-bit uniformly distributed samples with periodicity 10^{10}. The linear feedback shift registers (LSFRs) are started by seed values to produce the pseudo-random sequences. The data_ROM block performs the computation of the SRMD algorithm. The data_ROM is indexed by i and H and it is defined by $\text{data}\text{\_}\mathrm{ROM}[i,H]:=\frac{\text{delta}\left[i\right]}{\text{sigma}}={\left(\frac{1}{2}\right)}^{\mathrm{iH}}\sqrt{\frac{1}{2}}\sqrt{1-{2}^{2H-2}}$. The delta[i] values are stored and addressed by the i and H indexes. This would be prohibitive due to the large amount of memory resources needs for storing a wide range of delta[i] values. However, H values can be represented with only two-digits after the decimal point. Thus, 1,000 H values are necessary for each iteration of the SRMD algorithm. Since 0≤ maxlevel ≤ 16, delta[i] vector can have a maximum of 16,000 elements. Hence, 1.6% of the ROM memory resource was needed to store the delta[i] vector. In fact, 1/f noises have 1/2<H<1 (close to 1) and hence the memory needs can be reduced. This second memory block is used to store the output sample vector (X[i]). The binary representation of each X[i] noise output was truncated to 16-bit wide. The main functions of the control block are: Read the GRNG block output (Gaussian samples); Read the data_ROM values according to the selected values indexed by i and H; Evaluate the delta by multiplying previous data_ROM data to standard-deviation (sigma); Fill the initial values of the X[i] vector with the computed sigma∗Gauss values; Perform the loops iterations (one while and two for’s); Read fBm output noise sample levels from X[i] vector. The 1/f noise SRMD block implementation required ten digital signal processor (dsp) blocks and six phase locked loop (PLL) used for clock generation to achieve the target noise sample output rate. Following, each noise output value is PCM encoded/quantized and sampled at 8 KHz to produce the ambient noise levels. This sampling rate is necessary for the speaker classification experiments.
where T is the sampling period. The filter coefficients are obtained by the convolution h(k)=a(k)∗b(k), where a(k) and b(k) are the first N/2 terms obtained by expanding, respectively, the numerator and denominator of (6) in power series [12].
Validation results and discussion
β and H estimation results
β and H estimation results and the β estimation error
Noise | β | ε(β) | H |
---|---|---|---|
Airplane (real) | 1.13 | 0.124371 | 0.889 |
Airplane (proposed) | 1.14 | 0.076390 | 0.890 |
Airplane (fgwn) | 1.10 | 0.061164 | 0.862 |
Airport (real) | 0.89 | 0.411874 | 0.882 |
Airport (proposed) | 0.90 | 0.079132 | 0.891 |
Airport (fgwn) | 0.81 | 0.076929 | 0.813 |
Pink (real) | 1.02 | 0.041085 | 0.919 |
Pink (proposed) | 1.01 | 0.065109 | 0.915 |
Pink (fgwn) | 0.89 | 0.088994 | 0.847 |
It can be noted that for the artificial Pink noise samples, the proposed and fgwn methods achieve quite similar H target values, i.e., the low-frequency statistics. However, the H results estimated from the noise samples generated by the proposed method, are much closer to the H values of real ambient noises.
Kurtosis, mean, variance statistics
Kurtosis, mean and variance statistics results
Noise | K | μ | σ ^{2} |
---|---|---|---|
Airplane (real) | 2.94 | 0.001694 | 0.004665 |
Airplane (proposed) | 3.04 | 0.001679 | 0.004665 |
Airplane (fgwn) | 2.98 | 0.001929 | 0.004664 |
Airport (real) | 3.11 | 0.000021 | 0.003434 |
Airport (proposed) | 3.04 | 0.000031 | 0.003381 |
Airport (fgwn) | 2.98 | 0.000087 | 0.003428 |
Pink (artificial) | 3.02 | 0.001402 | 0.000945 |
Pink (proposed) | 3.03 | 0.001445 | 0.000818 |
Pink (fgwn) | 2.99 | 0.001455 | 0.000948 |
Bhattacharyya distance
_{ B d }results
Noise | Proposed | fgwn |
---|---|---|
Airplane | 0.0169 | 0.0663 |
Airport | 0.0209 | 0.0969 |
Pink | 0.0382 | 0.0372 |
PSD results
HTD results
Speaker classification experiments
In a speaker identification process, a speech utterance has to be identified as to which of the registered speakers it belongs. For the experiments, the speech utterances were corrupted with the real and generated noise samples. For the speaker identification were considered the mel-cepstral coefficients (MFCC) and the Gaussian mixed model (GMM) [24] which are respectively, the most commonly used speech features and classifier employed in speaker recognition tasks.
where $\overrightarrow{x}$ is a random vector of dimension D, ${b}_{i}\left(\overrightarrow{x}\right)$, i=1,…,M, are the Gaussian density components, and p_{ i }, i=1,…,M, are the mixture weights.
The decision rule of the speaker identification system chooses the speaker model for which this value is maximum.
Speaker identification accuracy results
The speaker identification task evaluation is performed on the KING speech corpus. This is composed of conversational sessions of speech recorded by 49 male speakers. For the experiments, five sessions are used resulting in 100 s average of speech per speaker, after silence removal. Three of these sessions (60s) are applied for the speaker model training. The remaining two sessions (40s), are used to evaluate the identification accuracies.
Conclusion
A new low-frequency 1/f ambient noise generator using fractional statistics is described in this article. The PSD shape of the 1/f generated noise samples is achieved from the ACFs of the noise samples generated with the fBm process. The implementation of the 1/f ambient noise generator enables the validation of the pattern and the PSD representation. It is shown that this proposition is very promising for the investigation of this noise effect in the signal processing techniques. Furthermore, the speaker identification experiments demonstrate that the generated ambient noise samples can be useful to serve as background or additive noise.
Declarations
Acknowledgements
This work is partially supported by the National Council for Scientific and Technological Development (CNPq) under the research grant 472461/2009-5.
Authors’ Affiliations
References
- Keshner M: 1/f noise. Proc. IEEE 1982, 70: 212-218.View ArticleGoogle Scholar
- Hooge F: 1/f noise. Physica B+ 1976, C83: 14-23.View ArticleGoogle Scholar
- Derzjavin A, Semenov A: Ocean ambient low frequency acoustic noise structure in shallow and deep water regions. Journal de Physique IV 1994, 4: 1269-1272.View ArticleGoogle Scholar
- Voss R, Clarke J: 1/f noise in music: Music from 1/f noise. J. Acoust. Soc. Am 1978, 63(1):258-263. 10.1121/1.381721View ArticleGoogle Scholar
- Voss R, Clarke J: 1/f noise in music and speech. Nature 1975, 258: 317-318. 10.1038/258317a0View ArticleGoogle Scholar
- Gong Y: Speech recognition in noisy environments: a survey. Speech Commun 1995, 16: 261-291. 10.1016/0167-6393(94)00059-JView ArticleGoogle Scholar
- Ming J, Hazen T, Glass J, Reynolds D: Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process 2007, 15: 1711-1723.View ArticleGoogle Scholar
- Zão L, Coelho R: Colored noise based multicondition training technique for robust speaker identification. IEEE Signal Process. Lett 2011, 18: 675-678.View ArticleGoogle Scholar
- Mandelbrot B, Van Ness J: Fractional brownian motions, fractional noises and applications. SIAM Rev 1968, 10: 422-437. 10.1137/1010093MathSciNetView ArticleGoogle Scholar
- Deriche M, Tewfik A: Signal modeling with filtered discrete fractional noise processes. IEEE Trans. Signal Process 1993, 41(9):2839-2849. 10.1109/78.236506View ArticleGoogle Scholar
- Tseng C, Pei S, Hsia S: Computation of fractional derivatives using fourier transform and digital fir differentiator. Signal Process 2000, 80(1):151-159. 10.1016/S0165-1684(99)00118-8View ArticleGoogle Scholar
- Ferdi Y, Taleb-Ahmed A, Lakehal M: Efficient generation of 1/fβ noise using signal modeling techniques. IEEE Trans. Circ. Syst 2008, 55: 1704-1710.MathSciNetView ArticleGoogle Scholar
- Hooge F: Discussions of recent experiments on 1/f noise. Physics 1972, 60: 130-144.Google Scholar
- Yousefi S, Jaldén J, Eriksson T: Linear prediction of discrete-time 1/f processes. Signal Process. Lett. 2010, 17(11):901-904.View ArticleGoogle Scholar
- Ninness B: Estimation of 1/f noise. IEEE Trans. Inf. Theory 1998, 44: 32-46. 10.1109/18.650986MathSciNetView ArticleGoogle Scholar
- Hurst E: Methods of using long–term storage in reservoirs. Proc. Inst Civil Eng 1956, 5: 519-543. 10.1680/iicep.1956.11503View ArticleGoogle Scholar
- Barnsley M, Devaney R, Mandelbrot B, Peitgen H, Saupe D, Voss R: The Science of Fractal Images. 1988.View ArticleGoogle Scholar
- Zão L, Coelho R: Low-frequency optical noise generator using fractional statistics. Electron Lett 2010, 46: 1072-1074. 10.1049/el.2010.0667View ArticleGoogle Scholar
- FreeSFX: Airport ext busy tarmac, Ambiences/Background Sound Effects. [http://www.freesfx.co.uk/soundeffects/airports/]; 2009
- Varga A, Steeneken H, Tomlinson M, Jones M: The noisex-92 study on the effect of additive noise on automatic speech recognition. Technical Report of Defence Evaluation and Research Agency [http://spib.rice.edu/spib/1992] Technical Report of Defence Evaluation and Research Agency
- Al-Alaoui M: Novel digital integrator and differentiator. Electron. Lett 1993, 29: 376-378. 10.1049/el:19930253View ArticleGoogle Scholar
- Box G, Muller M: A note on the generation of random normal deviates. Ann. Math. Stat 1958, 29: 610-611. 10.1214/aoms/1177706645View ArticleGoogle Scholar
- Flandrin P: Wavelet analysis and synthesis of fractional brownian motion. IEEE Trans. Inf. Theory 1992, 38: 910-917. 10.1109/18.119751MathSciNetView ArticleGoogle Scholar
- Reynolds D, Rose R: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process 1995, 3: 72-83. 10.1109/89.365379View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.