Modified Particle Swarm Optimization for Blind Deconvolution and Identification of Multichannel Fir Filters

Blind identification of MIMO FIR systems has widely received attentions in various fields of wireless data communications. Here, we use Particle Swarm Optimization (PSO) as the update mechanism of the well-known inverse filtering approach and we show its good performance compared to original method. Specially, the proposed method is shown to be more robust against lower SNR scenarios or in cases with smaller lengths of available data records. Also, a modified version of PSO is presented which further improves the robustness and preciseness of PSO algorithm. However the most important promise of the modified version is its drastically faster convergence compared to standard implementation of PSO.


Introduction
This paper addresses the problem of blind identification of multi-input-multi-output (MIMO) channels in the general scenario, where all of the M observed signals y(n) = [y 1 (n), . . ., y M (n)] T are considered to be convolutive mixtures of N unknown, yet statistically independent sources x(n) = [x 1 (n), . . ., x N (n)] T .This is often the case in typical echoic environments, where each sensor captures not only the direct contributions from each source, but also several reflected versions of the original signals with totally different propagation delays.The blindness here means that we have no prior knowledge about the MIMO system and no training data is available.The ith observed mixture can be written as Here n i (k) is additive white Guassian noise and f i j (l), −∞ < l < ∞ are tap weight coefficients of channel impulse response from jth source to the ith sensor.Each subchannel of the MIMO system can be written in z domain as Inverse filtering approach [1] consists of an iterative solution, which recursively extracts sources from the mixture one by one and then following the estimation of experienced channel by any extracted source, and reconstructs this signal as it was originally observed on each sensor.Now, after reduction of reconstructed sources from sensor measurements, the same procedure can be used for extraction of remaining sources.The source extraction step is down by the steepest descent maximization of a class of cumulantbased cost functions with respect to coefficients of equalizer filters.However, this optimization strategy is prone to being trapped in local maxima, especially in lower SNR scenarios or when the available data record is too small [1].
An alternative to this gradient-based optimization is a structured stochastic search of the objective function space.These types of global searches are structure independent because a gradient is not calculated and the adaptive filter structure does not directly influence the parameter updates.Due to this property these types of algorithms are potentially capable of globally optimizing any class of objective functions [2].Particle Swarm Optimization (PSO) is one of these stochastic structured search algorithms that have recently gained popularity for optimization problems.This paper investigates the application of PSO technique in source extraction step of the above-mentioned procedure for mutually independent, zero mean i.i.d binary sequences.It should be noted that although [3] has addressed the same subject, but its update equation seems more like a heuristic method which employs a random weighted addition of gradients based on second-order statistics and prediction methods, while PSO is defined as cooperative random search of particles toward their current global and local best points in search space according to some fitness function [4] as it will be discussed in Section 3.1 and simple multiplication of update equation by a random number (as in [3]) does not represent the essence of global random search suggested by PSO.
In this paper, after studying the suitability of standard PSO for blind identification problem, we propose a modified version which finds its initial direction according to original gradient-based method in [1].In fact the promised random search of PSO will be limited to smaller local areas with the most probability of maximizing the cost function.Also we modify the original fitness function used in [1,3] by two supportive performance indexes in order to reduce the probability of algorithm failures which are the result of complete ignorance of original objective function about the existence of additive noise.

Iterative Source Extraction and Channel Estimation
In this section, we provide a detailed description of inverse filtering approach.Note that here we neglect the presence of noise n i (k) in (1).Consider that a 1 × N row vector C T , with its ith element C i (z) = L i=1 c i (l)z −l , operates on observed data vector y(k) to construct equalizer output as Here h j (l), j = −∞, . . ., ∞, are elements of the overall impulse response from jth source to equalizer output: Based on third-and fourth-order cumulant of equalizer output, [1] introduced a set of objective functions J r2 as Here CUM k (e(k)) is kth order cumulant of the equalizer output.The following upper bound has derived in [1] for J r2 : J r2 (c) ≤ γ r max . ( Here, |γ r max | = max 1≤ j≤M |CUM r (x j (k))|.Equality of (5) holds when the overall transfer function becomes So the maximization of J r2 with respect to equalizer coefficients C T leads to extraction of one of the sources on equalizer output up to a scaled and delayed version as This separation criteria, however, has the weakness of complete ignorance about the presence of noise.Authors in [1] have developed their theoretical proof about suitability of the above-mentioned objective function based on the assumption that the noise n(t) in (1) is negligible.This is the reason for poor performance of their proposed iterative, batch, steepest descent method f optimization at lower SNRs.
If we consider the presence of noise n i (k) in ( 1), the addition of the term N i=1 n i (k) * c i (k) to e(k) of ( 2) will destruct the validity of theoretical proof in [1].In fact when noise is dominant in sensor measurements, cooperation of this noise with equalizer coefficients may build up new fake maxima into J r2 regardless of overall impulse response h j (k) and in this new maximum, separation point of ( 6) is no longer achievable.
However it is obvious that (6) would still provide local maximum for J r2 and separation would be met if we could somehow control the algorithm from trapping into these new maximum as it will be explained in Section 3.2.
After extraction of each source, the estimated signal can be used for estimation of the experienced channel by that signal through its way to each sensor: Now, the third step is to reconstruct the extracted source as it was originally observed on each sensor in order to suppress its contribution from sensor observations: Hereafter, the same procedure can be used for extraction of remaining sources and then estimating the SIMO channel experienced by any one of them.[2] is a stochastic, population-based evolutionary algorithm for problem solving in complex multidimensional parameter spaces.It is a kind of swarm intelligence that is based on social-psychological principles.

PSO Principles. Particle swarm optimization
A multidimensional optimization problem is given, along with an objective function to evaluate the fitness of each candidate point in parameter space; the swarm is typically modeled by particles in this multidimensional space that have a position and a velocity.After the definition of a random population of individuals (particles) as candidate solutions, they fly through hyperspace of parameters with the aid of two essential reasoning capabilities: their memory of their own best local position and knowledge of the global or their neighborhood's best [5].
PSO begins by initializing a random swarm of N p particles p i = [p i1 , . . ., p iL ], each having L parameters.At each iteration, the fitness of each particle is evaluated according to selected fitness function.The fittest experienced position of each particle is stored and progressively replaced as pbest i , i = 1, . . ., N p , along with a single most globally fit particle (gbest) as fitter locations are encountered during algorithm iterations.The parameters of each particle in the swarm are updated at each iteration (n) according to a velocity vector as [6] vel Here, vel i (n) is the velocity vector of particle i, e r ∈ (0, 1) is a random value, acc 1 and acc 2 are acceleration coefficients toward gbest and pbest i and ω is the inertia weight.
In fact, the trajectory of each particle is determined by random superposition of its previous velocity with the location of local and global best particles found by far.As new gbests are encountered during the update process, all other particles begin to swarm toward this new gbest, continuing their random search along the way.The optimization is terminated when all of the particles have converged to gbest or a sufficient condition of fitness function met:

Implementation of PSO for Source Extraction.
We propose to use PSO as optimization method for maximization of J r2 with respect to coefficients of parallel equalizers (c i (l), i = 1, . . ., M, l = 1, . . ., L) as M × L dimensional particles.
Clearly J r2 of (4) seems a reasonable choice of fitness function since in the absence of noise, its maximization provides pure separation of the source x j0 (k) at multichannel equalizer output.But as it is mentioned in Section 2 this objective function fails in low SNR scenarios and would have several fake maxima with respect to equalizer coefficients.Also the steepest gradient method of [1] has shown poor performance in cases with limited number of available data samples.
The simplicity of PSO suggests that we can easily combine several objective functions for further evaluation of the true level of fitness of a candidate particle in order to stay aside from the trap of fake global maxima in the case of noisy and smaller length records of data.For instance, since we expect complete deconvolution of extracted source at each iteration, a simple choice is to evaluate the level of correlation between successive samples of equalizer output.It is clear that there will be no time dependency between successive samples at ideal point of separation and deconvolution as in (5).It allows us to exploit signal correlations at nonzero lags as in [7].Specifically we expect the following cost function to converge to zero when the true separation met.
Also we can use histogram of equalizer output as a clue for leading particles toward desired solution.For the assumed equiprobable sequence of ∓1s we expect the histogram of the equalized output to have a form like Figure 1 which shows the histogram of one of the successfully extracted sources in high SNR scenario.In lower SNR cases, when algorithms fails this histogram finds irregular Gaussian like shapes as in Figure 2. Two important differences between these two histograms exist: first, the concentration of histogram in Figure 2 around zero and second, the existence of infrequent extreme deviations as few points which are larger than 2.
Then as another measure of fitness, we can use the frequency of occurrence of samples with very small absolute values (smaller than 0.2 e.g.,) or with values larger than 2 as fitness parameter J hist .Here by frequency of occurrence we mean the number of occurrences.Now, our overall fitness function of particles c i is defined as Here α, β are approximately small coefficients and their selection has some effects on the convergence speed and robustness of the algorithms.In this new fitness function, J r2 and J cor have comparable values as their weight should be adjusted by α and β.J hist has very large modulus values which is desirable because it should strictly reject any particle with undesired histogram shape in the tails.Note that we can not apply this objective function to original gradient based method since the disapproval of a new equalizer coefficient set is equal to termination of algorithm run and it may causes the algorithm to stock near the initialization point, even in high SNR scenarios.
As it will be shown in Section 4, although this combination of three different cost functions improves the performance of PSO algorithm in noisy scenarios, it also has restricted the approval of new global and local best particles to some extent and it may slow down the search for optimum vector of parameters even in high SNR cases.So there will be a tradeoff between convergence speed and probability of algorithm failures in noisy environments.
In Practice, one general weakness of PSO is that its local search is not guaranteed convergent and based on the selected swarm size and its initialization; it may converge in a local non-optimal solution.Also, the relatively slower rate of convergence of combined fitness function (13) requires larger swarm sizes and more iterations of algorithm.Hence, some modifications in original position update (11) have introduced using a similar approach suggested in [2].We modify (11) as follos hybrid update equation: Here, ρ is a small constant about 0.25 and k 1 , k 2 are two constants that may dynamically change during the algorithm.The added ∇ ci J r2 term is the gradient of J r2 w.r.t multichannel equalizer coefficients.Further details about derivation of this gradient are just the same as the steepest descent method of [1].In our implementation we have used larger starting k 2 and after few iterations set k 1 > k 2 .With this choice, we are able to use the gradient as initial fast direction of swarm toward desired point in search space.Later when the swarm finds its way toward desired solution, we set k 1 > k 2 in order to let random cooperative search of swarm perform its fine tuning of equalizer coefficients.Clearly the relation k 1 +k 2 = 1 must be hold.The exact values of these two will not seriously affect the results, since the PSO will adjust the initial direction of the gradient anyhow.For example a good initial choice would be k 1 = 0.2, k 2 = 0.8.These values will be changed into k 1 = 0.8, k 2 = 0.2 after 10 iterations.
From now on, we will refer to this modified version of PSO as hybrid PSO.

Simulation Results
In this section we compare performance of the proposed hybrid PSO of ( 14) with the steepest gradient method of [1] for the source extraction step of the well-known inverse filtering approach.Since the comparison with [3] was impossible due to its abstruse presentation of update equations we compare our proposed modified version with standard implementation of PSO (as in (11)) in terms of computational complexity, fidelity, and robustness against noise.We chose J 42 as objective function of steepest gradient algorithm and used (13) as fitness function of PSO.A 2 × 2 transfer function F(z) was chosen as Inputs {x j (k), j = 1, 2} of MIMO system are mutually independent, zero mean i.i.d., binary sequences taking values ±1 with probability 0.5 each.The normalized fourth-order cumulant is −2 in this case.
In the first simulation, the equalizer length was chosen to be 15 (L = 15).For the purpose of impulse response estimation and extracted signal cancellation, f i j0 (τ) was estimated for −20 < τ < 20.The possible permutation and scaling ambiguities in this estimation were solved by imposing proper normalization, and some shifting and alignments as in [1].
The performance index used for evaluation of M c monte carlo runs is normalized mean square error (NMSE) which can be written for each subchannel F i j (z) as In which f l i j (τ) denotes the estimate of the i jth subchannel in lth Monte Carlo runs.The overall NMSE is: The parameters for PSO and hybrid PSO algorithms were chosen according to the tradeoff between convergence speed and algorithm runtime.A population of 100 particles is used with the maximum of 400 allowable iterations for PSO and 60 iterations for hybrid PSO.α and β of (13) are set to 0.3 and acc 1 , acc 2 and ω of (10) are all set to 1. k 1 of (14) was chosen to be 0.2 at the start of simulation (k 2 = 0.8) in order to force the swarm in desired direction.After about 30 iterations, k 1 was set to 0.8 (k 2 = 0.8) so the cooperative random local search (around the global maximum) of the directed swarm begins.
Figure 3 shows the results for 30 Monte Carlo runs of these three algorithms for several SNRs.The moderate improvement of about 5 dB at lower SNRs can be seen in Figure 3. However as the SNR increases these entire three algorithms exhibit approximately the same very good performance of less than −17 dB.
As it was mentioned previously, the steepest gradient approach has difficulties coping with smaller lengths of available data records.In order to evaluate our proposed algorithm under such conditions, another simulation was done with different number of available observed samples.30 Monte Carlo simulations were run at SNR = 20 dB for data sample sizes of 200 to 1000 and the results are presented in Figure 4.It can be seen that the significant improvement of about 5 dB has achieved by proposed hybrid PSO algorithm for 300 < T < 600.Although, in the case of T = 200.The standard PSO algorithm is the only method with acceptable performance of less than −10 dB.The poor performance of hybrid method is due to primary effect of gradient term in (14).Since this gradient has gone in a wrong direction (just like we expect from complete failure of steepest gradient approach), its justification to PSO decreases the probability of even random convergences of PSO search to separation point.
It is interesting to note the stability of PSO method, specially the hybrid PSO.If we define the NMSEs of larger than −10 dB as algorithm failures, even in low noise scenarios, there exists the possibility of failure for steepest gradient approach.For instance, in SNR = 10, although the mean NMSE of 100 Monte Carlo runs are approximately the same, but 3 complete failures have met (10%).Table 1, summarizes the standard deviation of NMSE in the case of SNR = 10 and T = 1000, for these methods.
In final comparison of PSO and proposed hybrid PSO, the speed of convergence and computational complexity of these two must be studied.Simulation results show that for hybrid PSO, usually a small population of less than 30 particles is enough to very fine convergence of the swarm to the global optimal point; however it takes more than hundreds of iterations for even large populations of more than 100 particles for standard PSO.So, the very fast convergence and the preciseness of optimization are two most important promises of hybrid PSO.
In order to study the effect of α and β of (13) in the identification quality, another experiment was done.In this experiment, the quotient of α and β is the parameter under study.The simulation results for 50 Monte Carlo simulations with SNR = 5 dB are presented in Figure 2. It can be seen that the best results are met by β/α = 0.25.
Finally, it should be mentioned that our implementation of both PSO and hybrid PSO was the simplest possible approach in order to keep computational complexity and algorithm run time as close as possible to steepest gradient approach.However, it is always possible to further improve PSO algorithms by employing larger population of swarms and more number of allowable iterations with the cost of more computational complexity.Also the performance of any PSO algorithm can be further improved by strategically selecting the starting positions of the particles [8].

Conclusion
Two different realizations of Particle Swarm Optimization for source extraction step of well known inverse filtering MIMO identification approach were studied.They both show satisfying results as the original steepest descent method in noise-free scenarios.However, they achieved moderate improvement in lower SNRs and smaller data lengths.
Also the Hybrid PSO algorithm exhibited significant improvement in convergence speed compared to standard PSO, while the initial population of particles was kept smaller.This was the main advantage of hybrid PSO over standard PSO beside the fact that Hybrid PSO was the most precise method with the least probability of complete failure.

Figure 1 :
Figure 1: Histogram of successfully extracted source at equalizer's output.

Figure 2 :
Figure 2: Irregular histogram of equalizer's output when algorithm fails.