Skip to main content

Enhancement of acoustic tomography using spatial and frequency diversities


This article introduces several contributions to enhance an important application such as acoustic tomography (AT), using mainly the spatial and spectral diversities of underwater acoustic signals. Due to their inherited properties, (i.e. spareness, non-stationarity or cyclostationarity, wide-band frequency range, wide range of power, etc.), the process of underwater acoustic signals becomes a real challenge for many scientists and engineers who are involved in studies related to the ocean. For various applications, these studies require huge and daily information. AT techniques remain fast and cheap ways to obtain such data. Nowadays, active acoustic tomography (AAT), is communally used to generate powerful and repetitive acoustic sources. Recently, researchers have been attracted by an alternative way, called passive acoustic tomography (PAT), which uses acoustic opportune signals of their environment. PAT techniques are mainly used for ecological, economical and other reasons such as military applications. With PAT, no signal is emitted; therefore, problems become more challenging. The number and positions of existent sources are unknown, and sensors measure mixtures of available sources. Algorithms based on time or frequency domains are widely deployed to classify, identify, and study received signals in AAT applications. For PAT, researchers employ multiple sensors in order to add an extra dimension, (such as space). This article focuses on approaches used in space along with time or frequency to extract information, improve performances, and simplify the overall architecture. This article explains the use of signal processing and statistical approaches to solve problems raised using PAT and discusses the experimental results. The review of the literature offers a big variety of algorithms to deal with classic AAT problems. Therefore, only problems related to PAT have been considered herein.


Oceans cover more than 70% of the earth surface, roughly containing 97% of all our water supply and playing a major role in global climate regulation and economical systems.

Acoustic tomography (AT) is used in many civil or military applications such as: mapping underwater surfaces, oceanographical, meteorological applications, (to measure the temperature, the salinity, the motion and the depth of the water), to improve sonar technology, as well as other applications. Many algorithms[1] have been developed to deal with active acoustic tomography (AAT).

Interest in passive acoustic tomography (PAT), has been increasing, mainly, for acoustic discretion, (in military applications), ecological, economical or logistic reasons. On the other hand, PAT techniques are increasing the complexity of algorithms. In real world applications, PAT can be mainly achieved using following methods:

  1. 1.

    Emitting sources similar to natural sounds or noises: a set of artificial signals imitating natural sounds, (whales, dolphins, etc.), or noises, (waves, ships, etc.), are generated. The main advantage of such approach remains in the control of the sources and their positions, (similar to active methods). In order to achieve this, researchers could imitate the time-frequency signature of nature signals. However, this method is not totaly discreet as the generated signals may have different high order statistics (HOS), instantaneous power, or frequency than original signals. Besides, artificial signals generally can be characterized by specific patterns, (periodicity, time or statistical coherence, fixed positions, and deterministic motions, etc.). These specific patterns can be used to unmask hidden emitted signals.

  2. 2.

    Using natural signals: by completely relying on existing natural signals, a PAT system with a high discreet level can be achieved. However, main drawbacks of such system are the lack of information, (number, positions or natures of sources, etc).

  3. 3.

    Applying hybrid systems: by mixing the previous two strategies, better performances and good discretion levels could be achieved. However, that will results in more complex emitter-receiver systems.

On the one hand, it seems that the 2nd strategy is the more attractive one, (completely discreet systems and no emitters). On the other hand, the problems raised in this case are more challenging because of the total lack of information about the sources. In order to reduce the complexity of this problem, we investigate several advanced signal processing techniques and statistical approaches. In fact, let us assume that we are able to estimate the number of the sources, separate the sources form their mixing observed signals, and evaluate their statistical properties. In this case, the identification of the channel could be, also, investigated. Therefore, remaining PAT problems become very similar to AAT.

The article’s primary purpose is to discuss the preprocess observed mixed signals to extract maximum information about the sources, then, we can apply classic algorithms to deal with residual problems. This article is organized as follows: Section “Acoustic oceanic tomography”, describes AAT and PAT, briefly; Section “Assumption and background, contains the assumptions and mathematical models; Section “Pre-processing systems”, presents the preliminary studies; Section “Adaptive HOS estimators, proposes new HOS estimators, in order to enhance the spatial diversity of original sources; Section “Spatial diversity and independence discrimination criteria, discusses several criteria, so as to exploit the spatial or the spectral diversity of our signals; Section “Blind separation of observed acoustic signals”, presents independent component analysis, (ICA), algorithms to separate mixed observed signals; Section “Experimental results”, shows experimental results; and Section “Conclusion”, presents the conclusions.

Acoustic oceanic tomography

Acoustic tomography’s goal is to get a fast and cheap monitoring of water mass and sub-bottom characteristics. This monitoring requires an inversion 2-step procedure[2]. First, estimate the acoustic properties, (such as the water column sound speed profile, 3D structure of internal tides in water masses, geo-acoustic parameters of the seafloor), from the measurement of a known propagated acoustic waveform between fixed sources and receivers. Second, infer some ocean physical parameters from these estimated acoustic characteristics.

Active acoustic tomography

To perform oceanic tomography, an active acoustic emission is propagated between an emitter and a set of receivers on an horizontal track of about 10 km long. Frequencies involved in tomography range is from 30 Hz to a few kHz, whereas, power range is from 180 to 220 dB.

First works in tomography have been only considered deep water channel, (depth deeper than 1 km). In this case and in order to estimate underwater acoustic transmission channel parameters, acoustic refraction is the main physical phenomenon which should be considered.

In the mid 1990s, scientists have extend their interests to shallow water, (i.e. depth less than 300 m),[3].

In shallow water, an acoustic propagation encounters numerous interactions with the sea surface and the sea floor. Therefore, new techniques had to be developed such as ‘matched field processing’ in[4] and the ‘matched impulse response processing’ in[5]. In their applications, a single input multiple output, (SIMO), configuration is used to extract channel information.

To get efficient results in a SIMO configuration, a large number of sensors should be used which means increasing the experimental setting. To tackle the last problem and using frequency diversity, researchers proposed “matched impulse response processing” methods. In the last case, a wide band signal should be emitted, but a single distant hydrophone could be enough as a receiver. The main idea of such technique consists of estimating the channel impulse response by applying maximum likelihood or matched filter estimations on the known emitted and received signals[6]. Once the channel response filter has been estimated, other features such as time delay or magnitude of arrivals could be extracted. The last features could be used in order to estimate water column and sub-bottom properties.

Passive acoustic tomography

Active acoustic tomography strongly relies on the possibility to emit powerful acoustic signals in the ocean. Major problems can arise. powerful emissions need a heavy power supply which can drastically limit the efficiency of autonomous monitoring systems, thereby causing drastic harm to marine mammals and disturbing their behavior. Finally in a warfare context, some constraints about covertness may exit in the acoustic process. To overcome these problems, the concept of PAT has recently emerged in the community.

Passive acoustic tomography consists in estimating acoustic properties by using natural opportunity sources present in the channel at the time of interest without using active emission. Surface noise created by breaking waves, ship noise, and marine mammal calls are three kinds of opportunistic sources which are under the scope of passive tomography[7].

The main drawbacks of PAT are the lack of information about the number, positions, and nature of emitted signals. With more than two sources many actual tomography algorithms can’t give satisfactory results. Many others don’t work well or at all when the emitted signals are wide band signals[8]. Some algorithms take into consideration the position of acoustic sound emitters[9]. Typically, in real world PAT applications, underwater acoustic signals are generated by various moving sources whose number and positions are hardly, (or impossible), to be identified, (as in the case of shoal of fish or wave noises). It is obvious that PAT is a quite difficult technique requiring substantial effort in signal processing to tackle the unknowns of source position and emitted waveform as well as to separate the sources present simultaneously in the channel before switching them toward a dedicated blind inversion processor.

Assumption and background

In PAT applications, the sources are obviously signals of opportunities which have various properties such as spatial diversity, different probability density functions (pdf), different temporal or spectral structures, different time-frequency signatures, etc. These properties can be used at different level of the separation stage. However, in PAT applications, simple and cheap systems are often used which means that linear multi-sensor antenna are not recommended. Mainly, for this reason, ICA algorithms will be of great importance to reach our goal. ICA algorithms can successfully handle multi-input multi-output (MIMO) channel.

In a previous work[10], an extensive experimental study has been conducted in order to classify and characterize many recorded anthropogenic signals, (made by human activities as boats, ships, or submarine noises, etc.), and natural signals, (mainly animals sounds or natural noises, such as waves etc.). According to that study, one can add to the above mentioned features, the following ones:

  • Recorded signals are affected by a background ocean noise which can be considered as an additive white Gaussian noise (AWGN).

  • Some signals have a very weak kurtosis[11].

  • Almost all of the signals are non-stationary signals with more or less cyclic behavior as boat noises.

  • Natural signals are very sparse ones and artificial ones are very noisy.

The above mentioned properties have been considered to select appropriate ICA algorithms.

Underwater acoustic channel

Underwater sounds are produced by natural or artificial phenomena through forced mass injection leading to inhomogeneous wave equations which can be converted to frequency domain[12]. The frequency-domain wave equation is called the Helmholtz’s equation which gives us an underwater sound propagation model. A general solution of the Helmholtz’s equation is very difficult to obtain. Therefore, researchers use simplified propagation models, (such as the ray theory, the mode theory, the parabolic model, the hybrid model, etc.), according to their applications[13]. The choice of a propagation model depends on many parameters such as wave frequency, the depth of the sea, etc. In our case, (shallow water, i.e. the channel depth is about few hundred meters), our frequency range is from 300 to 10 KHz, the ray theory was the more appropriate propagation model.

The sound speed C, (in m/s), in oceans is an increasing function of temperature T, (°C), salinity S, (parts per thousand, ppt), and pressure which is a function of depth D, (in meters),[14]:

C = 1449 + 4 . 6 T 0 . 05 T 2 + 23 × 1 0 5 T 3 + 16 × 1 0 8 D 2 + 0 . 02 D + ( 1 . 34 0 . 01 T ) ( S 35 ) 7 × 1 0 13 T D 3

The above equation is an empirical relationship satisfied when 0 ≤ T ≤ 30, 30 ≤ S ≤ 40, and D ≤ 8000. In shallow deep underwater channels[15], (depth less than 300 m), where emitters and receivers are not so close to the water surface nor to the bottom and the distances among emitters and receivers are less than 3 Km, the sound speed could be approximated by a constant.

The reflected acoustic waves on the bottom of the propagation channel depend on many parameters such as the composition and the geometrical properties of the bottom[16].

The reflected acoustic waves on the top of the propagation channel, i.e. the water surface, depend, also, on many parameters such as the wind, the wave frequency as well as the swell properties[16]. For this reason, the water surface can’t be considered as a flat surface. Therefore, the direction of a reflected acoustic wave is dispersed in the space. However in average term, reflected acoustic waves can be considered as obtained by a flat surface with absorption coefficients[15]. In our model, a flat surface is considered and random coefficients are added to characterize other unknown parameters.

Finally to consider acoustic propagation effects, an acoustic model proposed by Schulkin and Marsh[17] was considered. According to that model, a received signal should be multiplied by a corrective coefficient p given by:

p= 1 r exp α r 20

Here r is a propagation distance and α stands for Rayleigh’s absorption coefficient which it can be approximated by:

α=(16.541 0 4 P w ) SA f T f 2 f 2 + f T 2 + B f 2 f T

where f T = 21 . 9 1 0 6 1520 T + 273 , (in kHz), T is the water temperature, (°C), S = 3.5% is the water salinity, (in the ocean S ≈ 35g/l), P w is the water pressure, (in kg/m2), A = 2.34 10−6and B = 3.38 10−6.

From physical point of view, an acoustic ray represents a propagation trajectory of an emitted signal between the source, (emitter), and the receiver. In many cases, the channel depth is limited in size which means that the propagation is multi-rays. Each ray may be bent by refraction if the sound speed is a function of depth and range. Ray trajectories and sound speed profile allow us to compute propagation times. In addition ray trajectories, water attenuation, boundaries roughness and sub-bottom properties allow us to compute the signal magnitude.

From a computational view point, ray trajectory is computed by solving the ‘Eikonal equation’ but signal magnitude is obtained as a result of ‘Transport equation’[12]. As general and analytical solutions of Eikonal and transport equations do not exist, researchers use approximate and simulated results[18].

Mathematical models

Under some mild assumptions, (i.e. MIMO configuration and ray propagation model), acoustic underwater channel can be considered as multiple paths which, in frequency domain, each of them can be defined by a complex constant gain[1]. Let S(n) denotes a vector of p unknown sources which are statistically independent from each other and X(n) is a q × 1 observed vector (Figure1).

Figure 1
figure 1

Channel model.

The relationship between S(n) and X(n) is given by:


where H ( z ) stands for the channel effect. In the case of convolutive mixture, H(z) = (h ij (z)) becomes a q × p complex polynomial matrix. In the following, we consider that the channel is a linear and causal one and that the coefficients h ij (z) are RIF filters, (where the coefficients are evaluated according to the previous section). Let M denotes the degree of the channel which is the highest degree of h ij (z). The previous equation (4) can be rewritten as:

X(n)= i = 0 M H(i)S(ni)+N(n)

Here H(i) denotes the q × p real constant matrix corresponding to the impulse response of the channel at time i and S(ni) is the source vector at time (ni).

Pre-processing systems

As it was mentioned before that the processing of acoustic signals is a very challenging problem. To enhance our processing algorithms, pre and post processing systems have been proposed.

Pre- & post-processing

Our sources are bounded in frequency domain. Therefore, a low-pass filter was extremely helpful for us to reduce the impact of the AWGN and, then, achieve better performances. It is worth mentioning that only three tested algorithms have given satisfactory results. These three algorithms, (for further details see the following references[1923]), were dedicated to separate non-stationary sources (audio or music signals). The last two algorithms[22, 23], which be called in the following SOS[22] and Parra and Alvino[23], are implemented in frequency domain using discrete frequency adapted filters.

Experimental studies showed the best results can be obtained by applying ICA algorithm over split signals in three frequency bands. Once the separation in each frequency bound are achieved, then, a reconstruction module should be used to recover original sources. Our reconstruction module is based on second order statistics which can be generalized in the use other statistical features. In the actual version, it uses the correlation of signal slices in time or frequency domain. Best results have been obtained when two cascade algorithms are used and the number of sensors is strictly greater than the number of sources, (q > p), as shown in Figure2.

Figure 2
figure 2

General structure. In this figure, Parra stands for the ICA algorithm proposed by Parra and SOS is the algorithm proposed by Rahbar et al., see Section “Experimental results”.

Estimation of source number

It is obvious that the number of sources is an input parameter. ICA algorithms can cope with an overestimate number of sources, (extra separated signals should be residual noises). However, an underestimation of that number can affect seriously overall performances[24]. For this reason, a rough estimation of that number should be considered. To roughly estimate the source number, few approaches have been considered and briefly discussed. Hereinafter, the channel is assumed overdetermined, (i.e. q > p).

To simplify our discussion, let us consider the simplest case, i.e. an instantaneous mixture, (memoryless channel, i.e. H(i) = 0, i ≠ 0 and H(0) = H is a real full column rank matrix). In this case, the number of sources can be estimated as the rank of the observation covariance matrix Σ X :

Σ X =E(X X T )= H Σ S H T

Here Σ S stands for the unknown and invertible diagonal covariance matrix of the statistically independent sources. For noise free channel, the rank of Σ X becomes equal to the rank of Σ S otherwise the number of sources[25].

With an AWGN channel, Σ X becomes a full rank matrix. Without loss of generality, let us assume that noise components have the same variance, then, the q singular values λ i of Σ X will have different values except the last qp ones. Normally, the first p singular values are linked to signal space and the last qp ones are related to the noise space. In order to apply this method, one should deal with two problems: How can we estimate the covariance matrix of non-stationary signals an what is the optimal threshold between the two sets of singular values? The estimation of covariance matrices has been conducted over slippery estimation windows, see Section “Adaptive HOS estimators’. Concerning the threshold, it can be easily set when the signal to noise ratio (SNR) is relatively high. Unfortunately, the SNR is our case is not high enough, (i.e. SNR > 2 dB). Therefore, different thresholds have been considered:

  • If q > p + 5, one can easily set a threshold as the limit between two sets of singular values. This approach requires a very good SNR and q >> p.

  • To improve the first approach, normalized singular values have been considered, (i.e. λ i have been divided by the maximum λ i ). Experimental results showed that a threshold can be easily set using normalized singular values when SNR is higher than 10 dB and the signatures of sources are relatively the same, (the signature of the i th source on the j th sensor is the power received by that sensor from that source. Therefore, the signature of a source depends on the source power and the channel parameters.). The last two assumptions can’t be, always, satisfied in our application.

  • Another method was considered: first, the singular values λ i should be sorted in descending order; second, sorted λ i should be divided by λ2. Finally, the number of sources is considered as the number of normalized λ i > ε, where ε depends on SNR. Experimentally, we obtained satisfactory results for SNR higher than 4 dB and ε > 0.1.

  • By considering that the signals are close to Gaussian ones, one can use Akaike’s information criterion, (AIC), to set the threshold. Even though the gaussianity assumption is a strong one, (underwater acoustic signals are very strong non-stationary signals which can not be considered as gaussian signals), Karhunen et al.[26] shows that obtained results are still satisfactory.

The above methods can be easily extended to the case of convolutive mixture, (memory channel), by considering our extended covariance matrix Σ X instead of Σ X [27], where X N ( n ) = ( X T ( n ) , X T ( n 1 ) , , X T ( n N ) ) T is the extended observation vector:

X N ( n ) = X ( n ) X ( n 1 ) X ( n N ) = T N ( H ) S N + M ( n )
= H ( 0 ) H ( 1 ) H ( M ) 0 0 0 H ( 0 ) H ( M 1 ) H ( M ) 0 0 0 H ( 0 ) H ( 1 ) H ( M ) × S N + M ( n )

where S N + M ( n ) stands for the extended signal vector and T N (H) is the Sylvester matrix which is full rank under some mild assumptions[27].

In order to improve our estimation, we implemented and tested another algorithm dedicated to estimate the number of telecommunication transmitted signals. In fact, Chen et al.[28] use two sets of receiver antennas X1(n) and X2(n) with N1 > p, (respectively N2 > p), components. The main idea of Chen’s algorithm consists on using the rank of a covariance matrix Σ Z :

Σ Z = Σ 1 Σ 12 Σ 12 Σ 2

where Z ( n ) = X 1 T ( n ) X 2 T ( n ) T , Σ i is the covariance matrix of X i (n) and Σ12 is the cross-covariance matrix of X1(n) and X2(n). Using Equation (8), a normalized covariance matrix Σ N is defined as follows:

Σ N = Σ 1 1 / 2 Σ 12 Σ 2 H / 2

Here XH represents the hermitian transpose of X. Chen et al. proved that the number of sources can be estimated using the singular values ρ i , canonical correlation coefficients, of Σ N . In fact, let us consider the following set of hypotheses:

H s : ρ 1 ρ 2 ρ s > ρ s + 1 = = ρ r = 0

where r = min(N1,N2) and 0 ≤ sr is the number of sources under test. The selected number is the one that can satisfy the following equation:

( 2 N s ( N 1 + N 2 + 1 ) ) i = s + 1 p log ( 1 ρ i 2 ) > T s if H s + T s if H s

where Hs+is the hypothesis that number of sources is higher than s and the threshold T s should be set so that the allowable probability of false alarm can be achieved.

A main advantage of the last algorithm comparing to previous approaches is that this algorithm can be applied even though the noise are spatially correlated and that it can give a confidence level for the estimated number. The main drawback is the computational effort. In fact, with 2N + 1 receivers, one can only estimate a source number up to N. In the following, we consider that the number of sources is already estimated.

Adaptive HOS estimators

In order to exploit spatial diversity, many blind or semi-blind separation; or identification algorithms uses HOS, in time or frequency domain. For this reason, the estimation of cross cumulants and moments up to the fourth order have been investigated in this section, further details are given in Appendix 1.

Using the definition of cumulants and moments[29], an estimator of fourth order cumulant can be easily derived:

k 4 (X)= 1 N i x i 4 3 N 2 i x i 2 2

k4(X) is a consistent biased estimator of Cum4(X). In previous studies[30], we proposed and compared estimators for auto-cumulants of second and fourth orders. Here, we propose new adaptive HOS estimators for fourth order cross-cumulants which can be applied on underwater acoustic signals which are non-stationary signals.

A non-biased estimator of fourth order cross-cumulants can be obtained from the definition of the cross-cumulants[29]. In fact, let us consider K22 an estimator of Cum22(X Y) defined as follows:

K 22 = a N i x i 2 y i 2 b N 2 ij x i 2 y j 2 2 c N 2 ij x i x j y i y j

where a, b and c should be set in order to make K22 a non-biased and consistent estimator. When samples x i and y i are independent, one can use similar estimators to these proposed in[30]. In the following, we assume that the samples are independent and identically distributed (iid) over time but spatially correlated. In this case, one can prove that K22 become a non-biased and consistent estimator:

E ( K 22 ) = aE ( X 2 Y 2 ) b N 2 ( NE ( X 2 Y 2 ) 2 c N 2 ( NE ( X 2 Y 2 ) + N ( N 1 ) E ( X 2 ) E ( Y 2 ) ) + N ( N 1 ) E ( XY ) 2 )

when a = N + 2 N 1 and b = c = N N 1 . Similarly, one can develop two other estimators:

K 13 = Cum ̂ 13 ( X , Y ) = N + 2 N ( N 1 ) i x i y i 3 3 N ( N 1 ) ij x i y i y j 2
K 31 = Cum ̂ 31 ( X , Y ) = N + 2 N ( N 1 ) i x i 3 y i 3 N ( N 1 ) ij x i y i x j 2

To obtain these estimators, signals are assumed to be stationary. The last assumption can not be satisfied in our application. Therefore, some modifications should be considered. Using some algebraic operations, Equation (13) can be modified as follows, see Appendix 2:

C 31 ( N ) = N 2 N C 31 ( N 1 ) + 1 N μ 31 ( X , Y ) + N + 2 N ( N 1 ) x N 3 y N 3 x N y N μ 20 ( X , Y ) 3 x N 2 μ 11 ( X , Y )

Here C31(N) is an adaptive online version of K31 and μ nm ( X , Y ) = 1 N 1 i = 1 N 1 x i n y i m is an unbiased consistent estimator of E(XnYm). An adaptive version of μ nm (X,Y) can be, also, obtained as follows:

μ nm ( k ) = 1 λ k ( λ ( 1 λ k 1 ) μ nm ( k 1 ) + ( 1 λ ) x k n y k m )

where 0 < λ < 1 is a forgotten factor. To evaluate the performances of last estimators, some simulations have been conducted using a non-stationary zero-mean signals. For example, let S(n) be a non-stationary signal that consists of four parts:

  • S1 is an uniform signal in [-1, 1] with 8,000 samples.

  • S2 is Gaussian with unit variance and 5,000 samples.

  • S3 is an uniform signal in [-2, 2] with 3,000 samples.

  • S4 is Gaussian with a standard deviation σ = 2 and 4,000 samples.

Using S, two other signals have been generated X(n) = S(n) and Y(n) = S3(n), (it is clear that x i and y i are i.i.d and that x i depends on y i ). Using the definition of cumulants and the properties S, we can prove that:

  • For uniform parts, Cum 31 ( X , Y ) = 2 35 a 6 , here a is the maximum amplitude.

  • For the Gaussian parts, Cum31(X,Y) = 6σ6.

We conducted many simulations, according to our experimental study, the performance of estimator (14) can be improved by using another forgotten factor 0 < γ < 1, see Figure3:

C N = N 2 N γ C N 1 + 1 N γ μ 13 ( X , Y ) + N + 2 N ( N 1 ) γ x N y N 3 3 γ x N y N μ 02 ( X , Y ) 3 γ y N 2 μ 11 ( X , Y ) + ( 1 γ ) x N y N 3 3 ( 1 γ ) x N y N μ 02 2
Figure 3
figure 3

Comparison among estimators (10), (14) and (15), here λ = 0.99 and γ = 0.998. The x-axis represents the discrete time in number of samples and the y-axis is the 4th order cumulants.

Finally, x N and y N in Equation (15) have been replaced by their average over a small estimation window, (10 to 50 samples). The above proposed estimators can be improved by considering non iid samples. However, in the last case, a stochastic model with transition probability should be considered. The last statement is beyond the scoop of this manuscript and it will be considered in a future study. Hereinafter, HOS are estimated at different stages using the estimators described in this section.

Spatial diversity and independence discrimination criteria

In the literature, one can find a huge number of ICA algorithms to solve the blind source separation (BSS) problem. Most of them are dedicated to the separation of instantaneous, (i.e. echo free), channel. In our application, the underwater acoustic propagation channel can be modeled by a convolutive mixture, (i.e a multi path and a MIMO finite impulse response (FIR) channel with huge filter order ≥ 6000). It is well known that a blind separation of statistically independent sources of convolutive mixture can lead us to the original sources up to a permutation and scalar filter:

ŝ 1 (n)= h 1 (z) s 1 (n)+ h 2 (z) s 2 (n)

where s2(n) represents a mixture of all sources except the first one s1(n). The filter h i ( z ) = h i ( 0 ) + h i ( 1 ) z 1 + + h i ( m i ) z m i is a residual separation filter. The separation is considered achieved when the norm of the residual error h2(z)s2(n) becomes much less than the one of the separated signal h1(z)s1(n). In addition, the identification or classification of underwater acoustic signals is extraordinarily difficult step because these signals are non-stationary and non-intelligible sparse signals with low variable kurtosis. In this context, the classification of ICA algorithms according to the separation quality becomes a difficult and important task.

The following discrimination criteria can be optimized to maximize the spatial diversity or the independence among estimated signals. At the same time, they can be very useful to quantify the separation achievement. In the last case, these criteria are called performance indices.

Modified crosstalk

In this section, a new and modified performance index is proposed. The crosstalk is the inverse of SNR and it is widely used as a performance index for BSS algorithms of instantaneous mixture. By definition the crosstalk index of the first estimated signal, is given by:

D r ( ŝ 1 , s 1 )=10 log 10 E ( ŝ 1 s 1 ) 2 E ( s 1 2 )

To apply the crosstalk, one should have original sources. Therefore, this performance index cannot be applied in real situation where sources are unknown. However it is very useful in simulations.

It is well known that sources can be separated from a convolutive mixture up to a permutation and up to a scalar filter. Therefore, the last definition D r is useless for the BSS convolutive mixture, see Equation (16), since it doesn’t take into consideration the power ratio between the filtered version of the signal ξ1 = h1(z)s1(n) and the residual error h2(z)s2(n).

We developed a modified definition for the crosstalk. First, one should apply (17) as D r ( ŝ 1 , ξ 1 ) . Second, an estimated h1(z) should be obtained using s1(n) and the estimated signal ŝ 1 . To estimate h1(z), one can minimize the least mean square (LMS) error ζ:

ĥ 1 = min h E ( ŝ 1 h s 1 ) 2 = min h ζ

Let H i = (h i (0)…h i (m i ))Tand S i = (s i (n)…s i (nm i ))T, the convolutive product in Equation (16) becomes a simple scalar product:

h 1 ( z ) s 1 ( n ) = H 1 T S 1

Using the independence properties of the sources, one can prove that:

ζ = ( H 1 H ) T E S 1 S 1 T ( H 1 H ) + H 2 T E S 2 S 2 T H 2 = ε H T Σ 1 ε H + H 2 T Σ 2 H 2 = E ( ŝ 1 ) 2 + H T Σ 1 H H T E ( S 1 ŝ 1 ) E ( ŝ 1 S 1 T ) H

where ε H = H1H and Σ i = E S 1 S 1 T is an invertible definite positive matrix. The second term of (19) doesn’t depend on H. Therefore, one can prove that the optimal value of H is given by:

H opt = Σ 1 E( S 1 ŝ 1 )

Our experimental results show that for a low order channel filter, (<20), this performance index can be used efficiently. When the order of channel is larger than 20, computing time becomes a big issue.

Mutual information

According to[31], mutual information (MI) I(p U ) is one of the best independence indices:

I( p U )= p U (V)log p U ( V ) π i = 1 N p u i ( v i ) dVdV

where U = (u1,…,u n )T is a random vector and P U (V) (resp. p u i ( v i ) ) are the joint, (resp. marginal), PDF. In the context of BSS problem, the joint and the marginal PDF are unknown but they can be estimated.

To estimate the MI in our project, we used a method proposed by Pham[32]. In his method, the integral is replaced by a discrete sum and the PDF are estimated using kernel methods. In[32], spline functions of third order have been used as kernel function. By definition, a spline function of order r is the PDF of the sum of r uniform independent random variables u i [−0.5,0.5]. For example, the 3rd order spline function is defined as:

K 3 ( u ) = 3 4 u 2 If | u | 1 2 ( 1 . 5 | u | ) 2 2 If 0 . 5 | u | 1 . 5 0 Elsewhere

Finally, the MI estimator is given by:

Î( u 1 ,, u n )= i Π ̂ U (i)log Π ̂ U ( i ) π k Π ̂ u k ( i k )

Here Π ̂ U ( i ) is the joint PDF estimator and Π ̂ u k ( j ) is the marginal PDF estimator. Good results have been obtained with stationary signals, but we couldn’t get similar results for underwater acoustic signals.

Quadratic dependence

To measure the independence among the components of a random vector X = (x1,,x n )T, the authors of[33] make a comparison between the joint PDF of the vector X and the marginal PDF product of its components x i . Using similar approach, Kankainen[34] proposed the quadratic dependence measure D(X) which is a comparison between the joint first characteristic function (FCF), i.e. Φ(Ω) = E{exp(j ΩTX)}, and the product of the marginal FCF:

D(X)= Φ ( Ω ) π k = 1 n Φ ( Ω i ) 2 h(Ω)dΩ

Here h is an integrable function from R n to R which satisfies the following two conditions[34]:

  • h is a non zero almost everywhere and a positive function.

  • For analytical FCF Φ(Ω), h should be positive around zero and vanish elsewhere.

If the components of X are independent in their set, then, the joint FCF is equal to the product of the marginal FCF, (i.e. Φ ( Ω ) = i = 1 n Φ ( Ω i ) ) and D(X) = 0. To deal with nonlinear BSS, Achard et al.[35] proposed the following h:

h(Ω)= i = 1 n σ X i Φ K ( σ X i Ω i ) 2 Π 2

Here K is a square integrable kernel function that its Fourier transform should be non zero almost everywhere and σ X i is a scale factor, (i.e. a positive function only depends on the PDF of X i ).

Using the energy conservation theorem of Parseval, Equation (23) can be replaced by the following functions[35]:

Q ( X ) = 1 2 R n D ( T ) 2 d T D ( T ) = E i = 1 n K t i x i σ i i = 1 n E K t i x i σ i

Finally, Achard et al.[35] proved that the quadratic dependence measure is equivalent to Q(X) = 0 x i are indenpendent from each other and Q can be estimated as follows:

Q ̂ ( X ) = 1 2 Ê ( F ( X ) ) + 1 2 i = 1 n Ê ( f ( x i ) ) Ê i = 1 n f ( x i )

Here f ( x k ) = 1 Ns i = 1 Ns K x k X k ( i ) σ k , X k (i) is the ith sample of the kth component of X, Ê is the empirical mean and F ( X ) = 1 Ns i = 1 Ns k = 1 n K x k X k ( i ) σ k where the kernel function K can be:

  1. (1)

    Gaussian Kernel K 1 ( x ) = exp ( x 2 )

  2. (2)

    Square Gaussian Kernel K 2 ( x ) = 1 ( 1 + x 2 ) 2

  3. (3)

    Inverse of Square Gaussian Kernel second derivative function K 3 ( x ) = 4 20 x 2 ( 1 + x 2 ) 2

In our experimental studies, the best results were obtained using the Gaussian Kernel. In fact, the Gaussian Kernel gives the largest possible difference between the quadratic independence measure applied on a vector A with i.i.d uniformly independent components and the quadratic independence measure applied on a vector B = MA, here M is a full rank mixing matrix. The main drawback of such performance index is the important computing time.

Non-linear Kernel decorrelation

Bach and Jordan[36] proposes an independence measure based on the concept of non-linear decorrelation or the F correlation function ρ F :

ρ F = max f , g F Cov f ( X ) , g ( Y ) Var f ( X ) Var g ( Y )

We call Cov(X,Y) and Var(X) respectively the covariance and the variance of X and Y . It is worth mentioning that F is a vectorial space of all functions applied from R to R which contents all Fourier transform basis, (i.e. the exponential functions exp(jwx), with w R ). ρ F means the independence between X and Y .

According to Bach and Jordan[36], the best choice of the two non-linear functions f and g can be done using Mercer Kernel functions. A bilinear function K(X,Y ) from a vectorial space X, (for example R m ) to R is said to be a Mercer kernel if and only if its Gram matrix is a semi-positive matrix. By definition the Gram matrix of basis vectors, (X1,…,X m ), of a m dimensional vectorial space X with respect to a bilinear function K(X,Y ) is the matrix given by G ij = K(x i y j ). K(X,Y ) should, also, have the translation invariance, the convergence in L 2 ( R m ) and isotropic properties. A possible kernel is the Gaussian kernel:

K(X,Y)=exp 1 2 σ 2 X Y 2

Table1 shows experimental results obtained by applying NL-decorrelation on source and mixed signals using three different kernels, Gaussian, Polynomial and Hermite functions. We should notice that for acoustic signals better results are obtained using polynomial kernel which gives us the maximum difference between independent and correlated signals. Our experimental studies show that this performance index can be applied successfully in our project. However, computing time and needed memory become extremely important when the number of samples is over 500,000 samples. Finally, the difference between the NL-decorrelation of the sources and the mixed signals depends on original signals, the chosen kernel, as well as, the mixing model and parameters.

Table 1 Using different kernels, Gaussian, Polynomial and Hermite functions, the NL-decorrelation is applied on source and mixed signals

Simplified non-linear decorrelation

Using similar approach to[36], we proposed a simplified performance index based on the concept of a non-linear covariance matrix Υ = (ρ ij ) defined by:

ρ ij = E f ( x i ) c g ( x j ) c E f ( x i ) c 2 E g ( x j ) c 2

where X = (x i ) is a random vector, f (x) and g(x) are two non-linear functions, and 〈x c = xE(x). If the components of X are independent from each other, then, Υ becomes a diagonal matrix. Using the last definition, we suggest the following performance index:

c=20log Off ( Υ ) 2 diag ( Υ ) 2

Here diag(M) is a diagonal matrix which has the same principal diagonal of matrix M and Off(M) = M−diag(M). Functions f and g are chosen from the following functions:

  1. (1)

    ‘Gauss’: Gaussian kernel.

  2. (2)

    ‘poly’: 6th order polynomial Kernel which its coefficients are the components of an unitary vector.

  3. (3)

    ‘atan’: Saturation kernel using arc-tangent function.

  4. (4)

    ‘tanh’: Saturation kernel using hyperbolic tangent function.

Our experimental studies, (see Table2), show the effectiveness of this performance index to deal with underwater acoustic signals and channels. The main drawback of this performance index is that obtained values depend on the kind and the number of original independent signals. Therefore, this performance index can only be used in simulations where the original sources are known.

Table 2 Simplified NL-decorrelation applied on source and mixed signals using different kernels

Independence measure based on the FCF

The joint FCF of a random vector X = (x1,…,x n )Tis equal to the product of the marginal FCF of its components if and only if they are independent from each other. Using that property, Feuerverger[37] proposed the following independence measure:

T n = Π 2 n 2 ij g ( X j X i ) g ( Y j Y i ) 2 Π 2 n 3 ijk g ( X j X i ) g ( Y j Y k ) + Π 2 n 4 ijkl g ( X j X i ) g ( Y k Y l )

where g is an adequately chosen function[37], X = Φ 1 8 X 3 8 n + 2 is the approximation of the score function of X, and Φ(X) is the PDF of zero mean and unite variance Gaussian signal. Our experimental studies show that the computing time is the main drawback of this performance index. We should mention that for stationary signals, this performance index is consistent. Unfortunately, the last intersting property is useless in our application since the acoustic signals are non-stationary signals.

Recently, Murata[38] proposed a simplified test to measure the independence between two random signals. This independence measure was, also, based on the estimation of the cross FCF:

Φ n XY (t,s)= 1 n i exp(jt X i +js Y i )

If X and Y are statistically independent from each other, then, Φ XY (t,s) = Φ X (t Y (s). Murata’s independence measure is defined by the following equation:

R 2 { Φ n XY ( t ) Φ n X ( t ) Φ n Y ( s ) } n k ( t , s ) dtds

Here k(t,s) is a bounded estimation window. Our experimental studies show that:

  • The obtained values depend on original sources. This inconvenient is common to previous performance indices.

  • For beta random variables, good results have been obtained. On the other hand, we noticed bad results for uniform random signals.

  • For acoustic signals, we noticed good results for instantaneous mixture and bad ones for convolutive mixtures.

  • Computing time is crucial.


The previously described performance indices can not be applied in real situations, where original signals are unknown because the performance values depend on the sources. Therefore, we developed a new performance index based on cross-cumulant:

Perfc= Cum ( 1 , 3 ) ( X , Y ) 2 ¯ + Cum ( 3 , 1 ) ( X , Y ) 2 ¯ Var ( X ) Var ( Y )

Here Cum ( 1 , 3 ) ( X , Y ) 2 ¯ is the average of Cum(1,3)(X,Y)2 which is obtained using a sliding estimation window. The index of Equation (32) is limited to two signals. To generalize this index to the case of multi-signals, we proposed the following index:


where Γ = (Perfc(X i ,X j )) and Off ( Γ ) = i j γ ij 2 . Good results have been obtained using this performance index on instantaneous or convolutive mixture of acoustic signals. However, the computing time is relatively important.

Blind separation of observed acoustic signals

In previous study[39], we implemented and tested some instantaneous ICA algorithms. According to that study, good results, at least in instantaneous mixture of acoustic underwater signals, can be obtained using ICA algorithms based on HOS or dedicated to non-stationary signals. The algorithms discussed in this section have been selected according to our previous study.

In real applications of PAT, hydrophones could record mixed signals. In order to apply classic AAT algorithms, one should, first, separate the recorded mixed signals. It was mentioned that in PAT applications, MIMO configuration is quite possible. In this case, the sources could be generated and recorded at different locations. This spatial diversity could be translated into statistical independence. Since the early of 1990s, ICA, has been considered as a set of important signal processing tools[4042]. By assuming that the unknown p emitted signals, (i.e. sources), are statistically independent from each other, ICA consists on retrieving a set of independent signals, (output signals), from the observation of unknown mixtures of the p sources. It was proved that the output signals can be the sources up to a factor, (or filter), scale and up to a permutation[43].

Due to long and sparse impulse response of acoustic underwater channels and acoustic underwater signals’ features, (i.e. non-stationary, close to Gaussian, sparseness, etc.), see Section “Assumption and background’, many ICA algorithms couldn’t achieve the separation of sources in our application. Every selected and implemented algorithm has been evaluated using the following steps: we, first, used the same, (or similar), signals to the ones originally proposed by the authors of that algorithm. Second, an algorithm should be run over some simulated scenarios using a set of non-stationary signals, (normally speech signals), in memoryless or simple convolutive channels. Algorithms that give good, (or at least satisfactory), results in the first two stages have been selected in our project.

Best experimental results were obtained using two frequency domain ICA Algorithms[22, 23] based on the minimization of second order statistics criteria in frequency-domain. These two algorithms exploit the spatial and the spectral diversity of the original signals. In the following, the major tested algorithms are briefly described.

Blind estimation of time delay

In order to retrieve source signals, one can estimate the transmission channel, then, separate the source using some invertible filters. In this scenario, an algorithm to estimate different time delays can be of great helpful. Emile and Comon[44] proposed an elegant blind time delay estimation algorithm for a simplified convolutive mixture:

x 1 ( t ) = s 1 ( t ) + s 2 ( t ) + + s N sig ( t ) + b 1 ( t ) x 2 ( t ) = α 1 s 1 ( t τ 1 ) + α 2 s 2 ( t τ 2 ) + + α N sig s N sig ( t τ N sig ) + b 2 ( t )

Here b i (t) stands for an AWGN. The proposed algorithm can estimate different mixing parameters, (τ i and α i ), using HOS in frequency domain, see Equation (34).

C k P = Cum ( R 1 ( w ) , , R 1 ( w ) P 2 , R 2 ( w ) , , R 1 ( w ) P 2 k R 2 ( w ) , , R 2 ( w ) ) k
C k P , α = Cum ( R 1 ( w ) , R 2 ( w ) , , R 1 ( w ) , R 2 ( w ) P 2 + 2 k R 1 ( w ) , R 2 ( w ) , , R 1 ( w ) , R 2 ( w ) ) P 2 2 k
C α , k P = Cum ( R 1 ( w ) , R 1 ( w ) , , R 1 ( w ) , R 1 ( w ) P 2 k R 2 ( w ) , R 2 ( w ) , , R 2 ( w ) , R 2 ( w ) ) 2 k

where R i (w) is the Fourier transform of observed signals. Using the independency assumption of the sources, algebraic operations, Van Der Monde matrix properties, and an inverse Fourier transform, the authors successfully generate a signal of shifted Diracs with the required time delays, see Figure4.

Figure 4
figure 4

Blind estimation of time delays, τ 1 = 15 τ 2 = 35 and τ 3 = 55, the sources signals are three ARMA signals.

Our experimental studies show that the performance can be improved by increasing the number of samples. On the other hand, even though we used over 250,000 samples, we couldn’t unfortunately achieve good results when the sources are underwater acoustic signals, see Figure5.

Figure 5
figure 5

Blind estimation of time delays, τ 1 = 25 τ 2 = 56 and τ 3 = 75, the sources signals are three underwater acoustic signals (military sheep noises and whale songs). In this simulation, the algorithm was unable to estimate the different delays.

It is worth to be mentioned that the authors proposed in[44] another version of their algorithm. However, we didn’t implemented the latest version of the algorithm, for the simple reason that the first version of algorithm didn’t give satisfactory results in our application. In fact, underwater acoustic channel is more complex than the model considered by the authors.

Nguyen’s algorithms

In the early 1990s, Nguyen and Jutten[4547] were the first to propose an ICA algorithm to separate a convolutive mixture of speech signals. The first version of the algorithm consists on the minimization of a cost function as the mathematical expectation of an odd nonlinear function evaluated over the estimated signals. Later on, they proposed another cost function as the sum of fourth order cross-cumulants. To prevent a matrix invertible problem, they proposed a recursive structure which can only deal with a mixture of two sources. The latest constraint can be easily avoided by using our recursive system proposed in[48]. In addition, the algorithms proposed by Nugyen et al. can be, easily, implemented and they have been used to separate speech signals. For these reasons, we decided to implement these algorithms.

In addition to different versions originally proposed by the authors, we implemented hybrid structures, (i.e. a minimization of cost function based on a weighted sum of their different cost functions). Unfortunately, our experimental studies show that the algorithm, in all implemented versions, is not helpful to reach our goal. In fact the performance of the separation were not satisfactory due to the particularity of our application. It is worth mentionning that the convergence of the algorithm was a critical point in many cases.

Natural gradient applied to entropy maximization

In order to characterize and localize the developing of material defects, acoustic emission analysis (AEA) is used. To improve the performance of their AEA, Kosel et al.[49] have processed observed signals by using an ICA algorithm proposed earlier by Amari and Cardoso[50] based on the natural gradient minimization algorithm proposed in[51], and introduced independently by Cardoso and Laheld[52] under the name of relative gradient.

Many variant of Amari’s algorithm can be found in the literature which are based on the minimization of different contrast functions such as MI Shannon entropy, etc. Douglaset al.[53, 54] addressed the stability problems of Amari’s algorithms and proposed the minimization of:

L ( W ) = log | det W ( 0 ) | i p y i ; W ( z ) log f i ( y i ) d y i

where W ( z , k ) = i W i ( k ) z i is the separation filter and p y i ; G ( z ) is the marginal probability density of y i , f i (y i ) is a nonlinear function, and the separation filter can be adapted using:

Δ W p ( k ) = η ( k ) W p ( k ) f ( Y ( k L ) ) U T ( k p )

where L is the filter order, U ( k ) = i = 0 L W L i T ( k ) Y ( k i ) and η ( k ) = μ β + p = 0 L | Y ( k p ) | q , μ, β‘ and q are constant parameters, stand for an adaptive minimization step suggested by Amari. According to same authors, the components f = (f ik (Y)) can be selected by:

f ik ( Y ) = f N ( Y ) if σ i 2 ( k ) κ iN ( k ) κ iP ( k ) γ ( k ) ρ iN ( k ) ρ iP ( k ) > 1 f P ( Y ) elsewhere

where f r (X), r = {P N}, is used when signals have positive, (Resp. negative), sign of kurtosis. ρ(k), κ(k) and σ(k) are signal statistics and they can be iteratively adapted. Finally, Douglas et al.[53] have suggested the following non linear function:

f N ( y ) = | y | 3 sign ( y ) and f P ( y ) = tanh ( 10 y )

In the context of our project, many simulations have been conducted. According to our experimental studies, these algorithms can render good results for stationary signals and for relatively short channel filters, (i.e. low order filters). Unfortunately, divergence problems or non satisfactory results were often observed when the signals were sparse non stationary ones and the channel filter was very long as in our application.

Blind separation of non stationary signals

Kawamoto et al.[19] proposed an ICA algorithm to separate a convolutive mixture of speech signals. The proposed algorithm can be considered as an extension of Matsuoka’s algorithm which can, only, deal with instantaneous mixtures, see[55, 56]. The main idea of Kawamoto’s algorithm is that the separation of non-stationary signals can be obtained by minimizing a cost function based on Hadamard inequality[57] with the following assumptions:

  1. (1)

    H(z) is a full rank stable filter matrix and it has no zero on the unit circle.

  2. (2)

    The sources are zero-mean non-stationary signals.

  3. (3)

    The sources have different auto-covariance r i (n,m) = E(s i (n)s i (nm)) which should be a time function.

In this case, Kawamoto et al. proved that the separation can be obtained by adapting the following filter:

min W ( z ) i = 1 N sig logE ( y i 2 ( n L ) ) logdet ( R Y ( n L ) )

where L Z stands for time delay and R Y (n) = E(Y(n)YT(n)).

In our simulations, good results have been obtained when the signals are speech ones and the channel filter is considered as a FIR, see Figure6. Unfortunately, we couldn’t obtain good results when the signals and the channel are driven form acoustic underwater applications. By using pre-processing stages described in Section “Pre-processing systems”, and huge number of samples, fairly average results have been shown. Besides, the obtained performance depends on the signals as well as on the transmission channel.

Figure 6
figure 6

Separation of two signals using Kawamoto’s algorithm, (the sources are Speech + Music). Signals have been filtered using a band-pass filter (100, 2,000 Hz).

The convergence needed a huge number of samples. Besides, obtained results were not always satisfactory. The performances of the algorithm depended on the source signals as well as the transmission channel. The algorithm was a time and memory consuming.

A frequency domain method for BSS of convolutive audio mixture (SOS)

Rahbar and Reilly[22] proposed an algorithm which minimizes a criterion Γ based on the cross-spectral density matrix of the observed signals. For non-stationary signals, the latter matrix depends on frequency and time epoch m:

Γ= 0 Π m = 0 M 1 F(w,m) F 2 dw
F(w,m)= P ̂ m (w) α = 0 , β = 0 L Ĥ α D ̂ m (w) Ĥ β T exp j ( α β ) w

where F ( w , m ) F 2 is the Frobenius norm of F(w,m), M is an estimation of channel degree, Ĥ α is an estimation of the channel response at time α, and D ̂ m ( w ) are diagonal matrices estimated cross-spectral density matrix of the sources. To estimate the cross-spectral density matrix of the signals, the authors used L estimation windows with L m samples each:

P ̂ m (w)= 1 J i = 0 J 1 X im (w) X im H (w)

where X im (w) is the Fourier transform of the observed signals, and J is the number of estimated windows such that L J < L m and J L J > L m .

It is clear that the minimization of (35) needs a continues variable w which it is very difficult to be implemented. To solve that problem, the authors proposed the minimization of another criterion over K frequency points such that w k = Πk K :

Γ = k = 0 K 1 m = 0 L 1 F R ( w k , m ) F 2 + F I ( w k , m ) F 2

where F R (w,m) and F I (w,m) are the real and the imaginary parts of Equation (36). Finally, the minimization was done using a conjugate gradient algorithm.

Convolutive blind separation of non-stationary sources

The approach proposed by Parra and Alvino[23] is similar to the one proposed by Rahbar et al. Using the spectral density of different signals, the authors suggested the minimization of the following criterion by using a gradient algorithm:

Ĝ , R ̂ S , R ̂ N = argmin G , R S , R N k w G ( w ) R ̂ X ( w , k ) R N ( w , k ) W H ( w ) R S ( w , k ) 2

where R ̂ X ( w , k ) is the estimated cross-power spectra of X. To improve the performance of their algorithm, the authors performed the minimization using a joint diagonalization algorithm applied on the following criterion J(w) and subject to a constraint in time domain concerning the filter size, (this constraint aims to solve the permutation indeterminacy in frequency domain):

J ( w ) = t , w t R X ( w , t ) 2 Off R X ( t , w ) F 2

Experimental results

Using the structure proposed in Figure2, many simulations have been conducted. Generally, over 500,000–1,000,000 samples were needed to achieve the separation. The original sources were sampled at 44 KHz. In almost all the simulations, the separation of artificial or natural signals have been successfully achieved. In these simulations, we have set the channel depth between 100 to 500 m, the distances among the sources or the sensors were among 30 to 100 m, the distances among the different sources and the divers sensors are from 1,500 to 2,500 m, the number of sensors is strictly higher than the number of sources.

Figure7 represents experimental results which were obtained by only applying SOS algorithm to separate a mixture of acoustic signals, (Ship and Whale).

Figure 7
figure 7

Experimental results: first column contains the original and the estimated sources, and second column contains the observed signals, (the sources are: whale sound and a boat noise).

Finally, good results have been obtained by only applying SOS algorithm except for some configurations notably when the sources are close to the water surface. For the latter cases, we found that the Parra algorithm before SOS algorithm can improve the overall results. Figure8 shows different experimental results obtained by the different algorithms, (Parra, SOS or Parra + SOS), each point corresponds to results of random simulations using Parra, SOS or Parra & SOS algorithms. In this figure, a normalized positive performance index based on a nonlinear decorrelation is used. The normalized performance index is forced to be zero for the mixture values and 1 for the sources.

Figure 8
figure 8

Experimental results obtained by different algorithms, (Parra, SOS or SOS + Parra), on divers configurations and using a normalized performance index (here the three curves represent maximum, minimum and average performance levels). Further details are given in Section “Experimental results”.


In this article, several signal processing contributions applied on real world application such as the PAT, have been presented. Many simulations have been conducted and experimental studies showed the necessity of considering pre-processing and post processing of the observed signals in order to achieve properly the separation of the sources.

Many algorithms have been implemented and tested. However, few algorithms which are dedicated to the separation of non-stationary signals, give us satisfactory results. In a real scenario of warfare applications, the use of any ICA algorithm becomes very challenging. In fact, many ICA algorithms can not achieve satisfactory results when:

  • Most of the signals are close to Gaussian ones.

  • Sources have very inhomogeneous power, (the power ratio can be up to a dozen of dB).

  • SNR can be very limited depending on operational situations.

  • Even though ICA algorithms can handle convolutive mixtures. However, in our applications, the channel filter orders can be up to few thousand. At the same time, such a filter is a very sparse one. In fact, just few filter parameters do not vanish.

Our future work consists on developing an ICA algorithm which can use other features of acoustic signals such as sparseness along with non-stationarity, etc.


Appendix 1: HOS estimators

Since the beginning of the 1980s, HOS methods and theories have been widely used in signal processing. Most of HOS algorithms are based on the fourth order statistics. By definition[58], the q th order moment μ q of a stochastic signal X is:

μ q = E ( X q )

where E stands for the mathematical expectation. The q th order cumulant of X can be evaluated from its moments, by using the Leonov–Shiryayev formula[59]:

Cum ( X 1 , , X q ) = ( 1 ) k 1 ( k 1 ) ! E i v 2 X i × E j v 2 X j E k v p X k

where the addition operation is over all the set of v i (1 ≤ ipq) and v i composes a partition[60] of {1,…,q}. By using the above relationship, we can calculate the 4th order cumulant of X:

Cum 4 ( X ) = E ( X 4 ) 4 E ( X ) E ( X 3 ) 3 E 2 ( X 2 ) + 12 E 2 ( X ) × E ( X 2 ) 6 E 4 ( X ) .

For a zero-mean stochastic signal, the second order cumulant, (i.e. the variance), is equal to its second order moment and its 4th order cumulant becomes:

Cum 4 (X)=E( X 4 )3 E 2 ( X 2 ).

Arithmetic estimators

Let X to be a zero mean stochastic ergodic signal where x i is an event, (or a signal sample), of X (1 < i < N). In this case, the arithmetic estimator of the qth order moment is given by:

μ q ̂ = 1 N i = 1 N x q r

This estimator asuumes that the signal X is stationary over N samples. This estimator is a non biased estimator, (i.e E ( μ q ̂ ) = μ q ), and its variance is given by:

Var ( μ q ̂ ) = μ 2 q μ q 2 N

Clearly, the above mentioned estimator is a consistent estimator; hence for stationary signals, its variance decreased with an increased number of samples. An arithmetic estimator of the qth order cumulant can be developed form Equation (38):

Cum q ( X ) ̂ = ( 1 ) k 1 (k1)! μ v 1 μ v 2 μ v p

It is proved[61] that the estimator (41) is a biased estimator and the estimation error decreases proportional to 1 N :

E Cum ̂ q ( X ) = p = 1 q ( 1 ) p N p 1 ( p 1 ) × μ q + ( N 1 ) μ v 1 μ v 1 + ( N 1 ) p 1 μ v 1 μ v p

However, it is a consistent estimator. A non-biased cumulant estimator can be deduced from the last equation:

Cum ̂ q (X)= p = 1 q c p ( 1 ) p (p1) μ v 1 μ v 2 μ v p

where the parameters c p depend on the partitions of the indices v i . These parameters can be estimated as the solution of q linear equations. Let us consider the fourth order cumulant:

Cum ̂ 4 (X)=a μ 4 ̂ 4b μ 1 ̂ μ 3 ̂ 3c μ 2 ̂ 2 +12d μ 1 ̂ 2 μ 2 ̂ 6e μ 1 ̂ 4

In order to make the last estimator unbiased, one should solve a linear system of equations obtained by comparing term-to-term the expectation of Equation (43) and the theoretical value given by (38):

a= N 3 + N 2 24 N + 24 ( N 1 ) ( N 2 ) ( N 3 )
b= N ( 2 N 2 10 N + 9 ) 2 ( N 1 ) ( N 2 ) ( N 3 )
c= N ( N 2 N 6 ) ( N 1 ) ( N 2 ) ( N 3 )
d= N 2 ( 2 N 5 ) 2 ( N 1 ) ( N 2 ) ( N 3 )
e= N 3 ( N 1 ) ( N 2 ) ( N 3 )

For zero mean signals, we can easily proved that:

E ( Cum ̂ 4 ( X ) ) = μ 4 3 N ( μ 4 + ( N 1 ) μ 2 2 ) = Cum 4 ( X ) 3 N ( Cum 4 ( X ) + 2 μ 2 2 ) .

That means the following estimator is an unbiased estimator for the fourth order cumulant of a zero-mean stationary signal X:

Cum ̂ 4 (X)= N + 2 N 1 μ 4 ̂ 3 N 1 μ 2 ̂ 2

For real time applications, the estimators should be adaptive ones. The estimator (40) is not an adaptive one, but it is easy to derive an adaptive version:

μ q ̂ {k}= 1 k i = 1 k x i q = ( k 1 ) μ q ̂ { k 1 } + x k q k

where μ r ̂ { k } is the estimator of the r th order moment at the k th iteration.

Exponential estimators

Exponential estimators are defined as follows:

μ q ̂ =(1 λ q ) i = 1 N λ q N i x i q

where 0 < λ q < 1 stands for a forgotten factor. This estimator can be calculated easily in an adaptive way:

μ q ̂ {k}= λ q μ q ̂ {k1}+(1 λ q ) x k q

The latest estimator is biased, ( E μ q ̂ = ( 1 λ q N ) μ q ), but it is asymptotically non biased. This estimator can achieve better estimation for the moments of non-stationary signals. Thus more λ is close to 1, more the past samples are taking into account. A non biased exponential estimator can be used:

μ q ̂ = 1 λ q 1 λ q N i = 1 N λ q N i x i q

Estimator (53) can be, also, modified into an adaptive version:

μ q ̂ {k}= 1 1 λ q N λ q ( 1 λ q k 1 ) μ q ̂ { k 1 } + ( 1 λ q ) x k q

An adaptive non biased estimator of the cumulants could be derived using (39) and (54). To simplify our discussion, the fourth order cumulant unbiased estimator for zero mean signals are developed as follows:

Cum ̂ q ( X ) { k } = Cum ̂ q ( X ) { k 1 } + ( 1 γ ) H k × Cum ̂ q ( X ) { k 1 }

where γ is another forgotten factor and

H k Cum ̂ q ( X ) { k 1 } = x k 4 4 x k 3 μ 1 ̂ { k 1 } 3 x k 2 μ 2 ̂ { k 1 } + 12 x k 2 μ 1 ̂ 2 { k 1 } 6 μ 1 ̂ 4 { k 1 } Cum ̂ q ( X ) { k 1 }

Appendix 2: adaptive unbiased estimator of 4th order cumulant

Let C13(N) = K13(X,Y) be the adaptive estimator of the cumulant 3×1 using N samples, A N = i N x i y i 3 and B N = ij N x i y i y j 2 . In this case, Equation (13) can be written as follows:

N(N1) C 13 (N)=(N+2) A N 3 B N

Hence, we can prove that:

N ( N + 1 ) C N + 1 = ( N + 3 ) A N + x N + 1 y N + 1 3 3 B N + x N + 1 y N + 1 j = 1 N + 1 y j 2 + y N + 1 2 i = 1 N + 1 x i y i

Last equation can be written as follows:

N ( N + 1 ) C N + 1 = N ( N 1 ) C N + A N + ( N + 3 ) x N + 1 y N + 1 3 3 x N + 1 y N + 1 j = 1 N y j 2 3 y N + 1 2 i = 1 N x i y i

Finally, the last equation can be rewritten as:

C N = N 2 N C N 1 + 1 N μ 13 ( X , Y ) + N + 2 N ( N 1 ) x N y N 3 3 x N y N μ 02 ( X , Y ) 3 y N 2 μ 11 ( X , Y )

where μ nm ( X , Y ) = 1 N 1 i = 1 N 1 x i n y i m is the estimator of E(XnYm) using N−1 samples. Using the last two equations, we derive the final form of our adaptive 4th order cumulant estimator:

C 31 ( N ) = N 2 N C 31 ( N 1 ) + 1 N μ 31 ( X , Y ) + N + 2 N ( N 1 ) x N 3 y N 3 x N y N μ 20 ( X , Y ) 3 x N 2 μ 11 ( X , Y )


  1. Gervaise C, Quinquis A, Martins N: Time frequency approach of blind study of acoustic submarine channel and source recognition. In Physics in Signal and Image Processing, PSIP 2001. Marseille, France; January 2001.

    Google Scholar 

  2. Munk W, Worcester P, Wunsch C: Ocean Acoustic Tomography. Cambridge University Press, Cambridge,; 1995.

    Book  Google Scholar 

  3. Baggeroer AB, Kuperman WA, Mikhalevsky PN: An overview of matched field methods in ocean acoustics. IEEE J. Oceanic Eng 1993, 18: 4.

    Article  Google Scholar 

  4. Chapman NR, Lindsay CE: Matched-field inversion for geoacoustic model parameters in shallow water. IEEE J. Oceanic Eng 1996, 21: 4.

    Article  Google Scholar 

  5. Hermand JP: Broad-band geoacoustic inversion in shallow water from waveguide impulse response measurements on a single hydrophone: theory and experimental results. IEEE J. Oceanic Eng 1999, 24: 1.

    Article  Google Scholar 

  6. Michalopoulou ZH: Estimating the impulse response of ocean: correlation versus deconvolution, in Inverse problems in underwater acoustics. Springer, Paris and Milan and Barcelone,; 2001.

    Google Scholar 

  7. Gervaise C, Vallez S, Ioana O, Staphan Y, Simard Y: Passive acoustic tomography: review, new concepts and application using marine mammals. J. Mar. Biol. Assoc. U. K 2007, 87: 5-10. 10.1017/S0025315407054872

    Article  Google Scholar 

  8. Martins N, Jesus S, Gervaise C, Quinquis A: A time-frequency approach to blind deconvolution in multipath underwater channels. In Proceedings of International Conference on Acoustics Speech and Signal Processing 2002, ICASSP 2002. Orlando, Florida, USA; 13–17 May 2002.

    Google Scholar 

  9. Gaucher D, Gervaise C: Feasibility of passive oceanic acoustic tomography: a Cramer Rao bounds approach. In Oceans 2003 Marine Technology and Ocean Science Conference. San Diego, USA; 22–26 September 2003. pp. 56–60

    Google Scholar 

  10. Gaucher D, Gervaise C, LE Flock H: Contributions to passive acoustic oceanic tomography. In 7me Journes d’Acoustique Sous-Marine. Brest, France;

  11. Mansour A, Jutten C: What should we say about the kurtosis. IEEE Signal Process. Lett December 1999, 6(2):321-322.

    Article  Google Scholar 

  12. Jensen FB, Kuperman WA, Porter MB, Schmidt H: Computational ocean acoustics. Springer-Verlag, New York, London, Tokyo,; 2000.

    MATH  Google Scholar 

  13. Etter P: Recent advances in underwater acoustic modelling and simulation. J. Sound Vib 2001, 240(2):351-383. 10.1006/jsvi.2000.3212

    Article  Google Scholar 

  14. Etter P: Underwater acoustic modeling principles, techniques and applications. Elsevier, New York,; 1991.

    MATH  Google Scholar 

  15. Lurton X: Introduction to underwater acoustics principles and applications. Springer, London,; 2002.

    Google Scholar 

  16. Brekhovskikh LM, Lysanov YP: Fundamentals of ocean acoustics. Springer Verlag, New York,; 2003.

    MATH  Google Scholar 

  17. Shulkin M, Marsh HW: Sound absorption in sea water. J. Acoustical Soc. Am 1962, 134: 864-865.

    Article  Google Scholar 

  18. Etter PC: Underwater acoustic modeling and simulation. Spon Press Editor, London, UK,; 2003.

    Book  MATH  Google Scholar 

  19. Kawamoto M, Matsuoka K, Ohnishi N: A method of blind separation for convolved non-stationary signals. Neurocomputing 1998, 22: 157-171. 10.1016/S0925-2312(98)00055-1

    Article  MATH  Google Scholar 

  20. Kawamoto M, Kardec Barros A, Mansour A, Matsuoka K, Ohnishi N: Real world blind separation of convolved non-stationary signals. In First International Workshop on Independent Component Analysis and signal Separation (ICA99). Edited by: Cardoso JF, Jutten Ch, loubaton Ph. Aussois, France; 11–15 January 1999. pp. 347–352

    Google Scholar 

  21. Rahbar K, Reilly J: Blind separation of convolved sources by joint approximate diagonalization of cross-spectral density matrices. In Proceedings of International Conference on Acoustics Speech and Signal Processing 2001, ICASSP 2001. Salt Lake City, Utah, USA; May 7–11 2001.

    Google Scholar 

  22. Rahbar K, Reilly J: A frequency domain method for blind source separation of convolutive audio mixtures. IEEE Trans. Speech Audio Process 2005, 13(5):832-844.

    Article  Google Scholar 

  23. Parra L, Alvino CV: Convolutive blind separation of non-stationnary sources. IEEE Trans. Speech Audio Process May 2000, 8(3):320-327. 10.1109/89.841214

    Article  Google Scholar 

  24. Mansour A, Jutten C, Loubaton Ph: Subspace method for blind separation of sources and for a convolutive mixture model. In European Signal Processing Conference. Elsevier, Triest, Italy; September 1996. pp. 2081–2084

    Google Scholar 

  25. Kailath T: Linear systems. Prentice Hall, New Jersey,; 1980.

    MATH  Google Scholar 

  26. Karhunen J, Cichocki A, Kasprazak W, Pajunen P: On neural blind source separation with noise suppression and redundancy reduction. Int. J. Neural Syst April 1997, 8(2):219-237.

    Article  Google Scholar 

  27. Mansour A: A mutually referenced blind multiuser separation of convolutive mixture algorithm. Signal Process November 2001, 81(11):2253-2266.

    Article  MATH  Google Scholar 

  28. Chen W, Reilly JP, Wong KM: Detection of the number of signals in noise with banded covariance matrices. IEE Proc- Radar, sonar and Navogation October 1996, 143(5):289-294.

    Article  Google Scholar 

  29. Kendall M, Stuart A: The advanced theory of statistics: Design and analysis, and time-series. Charles Griffin & Company Limited, London,; 1961.

    MATH  Google Scholar 

  30. Martin A, Mansour A: Comparative study of high order statistics estimators. In International Conference on Software, Telecommunications and Computer Networks. Split (Croatia), Dubrovnik (Croatia), Venice (Italy); October 10–13 2004. pp. 511–515

    Google Scholar 

  31. Tan Y, Wang J, Zurada JM: Nonlinear blind source separation using a radial basis function network. IEEE Trans. Neural Networks January 2001, 12(1):124-134. 10.1109/72.896801

    Article  Google Scholar 

  32. Pham D-T: Fast algorithm for estimating mutual information, entropies and score functions. In 4th International Workshop on Independent Component Analysis and blind Signal Separation, ICA2003. Nara, Japan; 1–4 April 2003. pp. 17–22

    Google Scholar 

  33. Rosenblatt M: A quadratic measure of deviation of two-dimensional density estimates and a test of independence. Ann. Stat 1975, 3(1):1-14. 10.1214/aos/1176342996

    Article  MathSciNet  MATH  Google Scholar 

  34. Kankainen A: Consistent testing of total independence based on empirical characteristic functions. Ph.D. thesis, University Jyvaskyla 1995

    Google Scholar 

  35. Achard S, Pham D-T, Jutten C: Quadratic dependence measure for nonlinear blind sources separation. In 4th International Workshop on Independent Component Analysis and blind Signal Separation, ICA2003. Nara, Japan; 1–4 April 2003. pp. 263–268

    Google Scholar 

  36. Bach FR, Jordan MI: Finding clusters in independent component analysis. In 4th International Workshop on Independent Component Analysis and blind Signal Separation, ICA2003. Nara, Japan; 1–4 April 2003. pp. 891–896

    Google Scholar 

  37. Feuerverger A: A consistent test for bivariate dependence. Int. Stat. Rev 1993, 61(3):419-433. 10.2307/1403753

    Article  MATH  Google Scholar 

  38. Murata N: Properties of the empirical characteristic function and its application to testing for independence. In Third International Workshop on Independent Component Analysis and signal Separation (ICA2001). San Diego, California, USA; 9–12 December 2001. pp. 295–300

    Google Scholar 

  39. Mansour A, Gervaise C: ICA applied to passive ocean acoustic tomography. WSEAS Trans. on Acoustics and Music April 2004, 1(2):83-89.

    Google Scholar 

  40. Cardoso JF, Comon P: Independent component analysis, a survey of some algebraic methods. In International Symposium on Circuits and Systems Conference, volume 2. Atlanta, USA; May 1996. pp. 93–96

    Google Scholar 

  41. Mansour A, Kardec Barros A, Ohnishi N: Blind separation of sources: Methods, assumptions and applications. IEICE Trans Fundam Electron, Commun and Comput Sci August 2000, E83-A(8):1498-1512.

    Google Scholar 

  42. Jutten C, Karhunen J: Advances in nonlinear blind source separation. In 4th International Workshop on Independent Component Analysis and blind Signal Separation, ICA2003. Nara, Japan; 1–4 April 2003. pp. 245–256

    Google Scholar 

  43. Comon P: Independent component analysis, a new concept? Signal Process April 1994, 36(3):287-314.

    Article  MATH  Google Scholar 

  44. Emile B, Comon P: Estimation of time delays between unknown colored signals. Signal Process 1998, 69: 93-100. 10.1016/S0165-1684(98)00061-9

    Article  Google Scholar 

  45. Nguyen Thi L, Jutten C, Caelen J: Separation aveugle de parole et de bruit dans un mlange convolutif. In Actes du XIIIème colloque GRETSI. Juan-Les-Pins, France; September 1991. pp. 737–740

    Google Scholar 

  46. Nguyen Thi L, Jutten C, Caelen J: Speech enhancement: Analysis and comparison of methods in various real situations. In European Signal Processing Conference. Edited by: Vandewalle J, Boite R, Moonen M, Oosterlinck A. Elsevier, Brussels, Belgium; August 1992. pp. 303–306

    Google Scholar 

  47. Nguyen Thi L, Jutten C: Blind sources separation for convolutive mixtures. Signal Process 1995, 45(2):209-229. 10.1016/0165-1684(95)00052-F

    Article  MATH  Google Scholar 

  48. Kardec Barros A, Mansour A, Ohnishi N: Removing artifacts from ECG signals using independent components analysis. NeuroComputing 1999, 22: 173-186.

    Article  MATH  Google Scholar 

  49. Kosel T, Grabec I, Kosel F: Time delay estimation of acoustic emission signals using ICA. Ultrasonics 2002, 40: 303-306. 10.1016/S0041-624X(02)00111-7

    Article  Google Scholar 

  50. Amari SI, Cardoso JF: Blind source separation-semiparametric statistical approach. IEEE Trans. on Signal Process November 1997, 45(11):2692-2700.

    Article  Google Scholar 

  51. Amari SI: Neural learning in structured parameter spaces: Natural Riemannian Gradient. In Neural Information Processing System-Natural and Synthetic. San Diego, Colorado, USA; 2–7 December 1996.

    Google Scholar 

  52. Cardoso JF, Laheld B: Equivariant adaptive source separation. IEEE Trans. Signal Process December 1996, 44(12):3017-3030. 10.1109/78.553476

    Article  Google Scholar 

  53. Douglas SC, Cichocki A, Amari SI: Multichannel blind separation and deconvolution of sources with arbitrary distributions, in the book Neural Networks for Signal Processing. In IEEE Workshop on Neural Networks for Signal Processing. New York; September 1997. pp. 436–445

    Google Scholar 

  54. Cichocki A, Douglas SC, Amari S: Robust techniques for independent component analysis (ICA) with noisy data. NeuroComputating 1998, 22: 113-129. 10.1016/S0925-2312(98)00052-6

    Article  MATH  Google Scholar 

  55. Matsuoka K, Oya M, Kawamoto M: A neural net for blind separation of nonstationary signals. Neural Networks 1995, 8(3):411-419. 10.1016/0893-6080(94)00083-X

    Article  Google Scholar 

  56. Kawamoto M, Matsuoka K, Oya M: Blind separation of sources using temporal correlation of the observed signals. IEICE Trans. Fundam Electron, Commun. Comput Sci April 1997, E80-A(4):111-116.

    Google Scholar 

  57. Noble B, Daniel JW: Applied linear algebra. Prentice-Hall, New Jersey,; 1988.

    MATH  Google Scholar 

  58. McCullagh P: Tensor methods in statistics. Chapman and Hall, London,; 1987.

    MATH  Google Scholar 

  59. Shiryayev AN: Probability. Springer Verlag, London,; 1984.

    Book  Google Scholar 

  60. Papoulis A: Probability, random variables, and stochastic processes. McGraw-Hill, New York,; 1991.

    MATH  Google Scholar 

  61. Kotz S, Johnson NL: Encyclopedia of statistical sciences. University of Amesterdam, Amesterdam,; 1993.

    Google Scholar 

Download references


A part of this work was supported by the French Military Center for Hydrographic & Oceanographic Studies, (SHOM i.e. Service Hydrographique et Océanographique de la Marine, Centre Militaire d’Océanographie).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ali Mansour.

Additional information

Competing interests

The author declares that I have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Mansour, A. Enhancement of acoustic tomography using spatial and frequency diversities. EURASIP J. Adv. Signal Process. 2012, 225 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: