 Research
 Open Access
 Published:
Multiheaded deep learningbased estimator for correlatedSIRV Pareto type II distributed clutter
EURASIP Journal on Advances in Signal Processing volume 2023, Article number: 81 (2023)
Abstract
This paper deals with the problem of estimating the parameters of heavytailed sea clutter in highresolution radar, when the clutter is modeled by the correlated Pareto type II distribution. Existing estimators based on the maximum likelihood (ML) approach, integerorder moments (IOM) approach, fractionalorder moments (FOM), and logmoments (logMoM) have shown to be sensitive to changes in data correlation. In this work, we resort to a deep learning (DL) approach based on a multiheaded architecture to overcome this problem. Offline training of the artificial neural networks (ANN) is carried out by using several combinations of the clutter parameters, with different correlation degrees. To assess the performance of the proposed estimator, we resort to Monte Carlo simulation, and we observed that it has superior performance over existing approaches in terms of estimation mean square error (MSE) and robustness to changes of the clutter correlation coefficient.
1 Introduction
The construction of systems that distinguish objects of interest buried in background disturbance constitutes a significant challenge in radar signal processing. In maritime surveillance applications, sea clutter represents a dominant undesired signal that affects radar performance. Besides the applicability in realtime scenarios, an effective detection performance should satisfy the adaptivity to clutter power spectral density and, more broadly, to the clutter multidimensional probability density function (pdf) while maintaining a constant false alarm rate (CFAR). Consequently, the prior knowledge of clutter distribution enhances target detection capabilities in white or correlated nonGaussian environments. Improvements in radar system technology and the increase in signal processing applications in scale, complexity, and power cancel in fact the stationarity and Gaussianity assumptions that could not adequately reflect the statistical characteristics of the environment [1]. Related research works revealed that highresolution radar scatters contain spikes giving rise to nonGaussian heavytailed measurements. Thus, different heavytailed distributions were proposed to model clutter returns from high resolution and low grazing angles, this includes lognormal, Weibull, Pareto, and Kdistribution [2,3,4,5]. Depending on the data acquisition geometry, the environmental conditions, and the radar parameters, the sea clutter echoes could be correlated in temporal and/or spatial components. However, the idealistic consideration of independent and identically distributed (iid) clutter returns affects the detection and leads to highly suboptimum performance. Therefore, the derivation of models characterizing nonGaussian correlated clutter was addressed in [6, 7] based on a general approach used in [8] for formulating the multivariate pdf of a correlated nonGaussian random vector and extended to incorporate complex random vectors for lognormal or Weibull distributed clutter.
Available approaches for simulating correlated sea clutter are primarily divided into two categories: the class of memoryless nonlinear transformation (MNLT) and the spherically invariant random processbased approach (SIRP). The MNLT technique is broadly inspected in [9], and the method's downside lies in its incapacity to manipulate both the PDF vectors and the power spectrum. On the other hand, the nonlinear mapping involved in the generation process could not satisfy the Hermitian property of the covariance matrix. As a second approach, Conte and Longo [10] suggested that the radar clutter process can be represented by a relatively straightforward approach based on the SIRP. The latter provides a mathematically tractable tool for modeling the pdf of correlated heavytailed nonGaussian random variables avoiding the nonlinear transformation and ensuring the control of the PDF vectors' along with the power spectrum. Sampling SIRP results in a spherically invariant random vector (SIRV), and authors of [11, 12] detailed the properties in conjunction with consistent PDFs with the SIRV model. The distribution of the random vector associated with a clutter modeled as a SIRV process forms a compoundGaussian model interpreted as a locally Gaussian process with a random local power variable.
Estimation of clutter model parameters forming the background level, in which the target of interest is embedded, is an essential part in CFAR detection schemes. Since the statistical models of clutter vary with the application scenario, parameter estimation methods will also change with clutter nature. As a result, several approaches have been offered based on least squares (curve fitting), matching of moments, and maximum likelihood (ML) approaches. For instance, numerous estimators have been constructed to deal with Compound Gaussian (CG) clutter, with and without additive thermal noise [13,14,15,16]. The integerorder moments approach is commonly used for parameter estimation of numerous clutter models and provides suboptimal results [14]. For noiseless case, it was shown that nonintegerorder moments and logmoments could be manipulated to get closedform estimators labeled FOME (Fractional Order Moments) and [zlog(z)] methods, respectively, for K, Pareto type II (GP), and CGIG (Compounded Gaussian Inverse Gaussian) radar clutter parameters [14, 15].
In terms of sea clutter model types, several authors have considered different decision rules that are independent of true clutter parameters [17,18,19,20]. Several CFAR detectors operating in homogeneous and heterogeneous Pareto type II clutter are derived when the shape parameter or the scale parameter of the Pareto type II model is known a priori [19]. If these parameters are unknown, an alternative procedure based upon the Bayesian approach is introduced in which a modified decision rule is given in integral form [20].
With the rapid emergence of the different machine learning paradigms, deep learning (DL) techniques have gained significant interest and motivated several advances in radar signal processing and target detection/classification applications [21,22,23,24,25]. In the context of parameter estimation, several deep neural networks architecture were proposed [26, 27]. In more details, a hybrid CNNLSTMbased approach for estimating the shape parameter in a K + noise distribution for a single look case was developed in [27], and better estimation performance was achieved against the classical higherorder moment (HOME) and [zlog(z)] estimators with improved execution time. Authors of [27] suggested that estimating clutter parameters could be seen as a manytoone sequence prediction problem and confirm the effectiveness of deep neural networks in learning pertinent features from the environment returns leading to a reasonable estimation performance of the shape parameter. Furthermore, given a specific theoretical distribution, the estimation of the clutter parameters may be conveyed as a time series forecasting problem. DL methods have shown noticeable performance in solving this kind of problem.
Based on the compoundGaussian model, the problem of radar detection in a nonGaussian correlated environment has been considered in many works [28,29,30,31,32] where the clutter covariance matrix is estimated using the range clutter profile (CRP). Gini and Greco [33] proposed an iterative approximate maximum likelihood estimator (AML) for covariance matrix estimation with unknown parameters and a minimal number of iterations while maintaining the detector's CFAR property. Recently, Sangston et al. [34] proposed a parametrically dependent detector referred to as GLRTLTD algorithm for correlated nonGaussian clutter. The robustness of the GLRTLTD detector was investigated via true values of the shape and scale parameters of an inverseGamma texture. For a realistic situation, unknown parameters should be estimated from the collected samples. Already in [34], the parameter estimation of the Pareto model was addressed using the higherorder moments approach (see eqs. (43) and (44) in [34]), leading to what is called the Adaptive GLRT (AGLRT). However, this estimator has been proposed for independent and identically distributed (iid) clutter samples. For very spiky clutter (i.e., shape parameter less than 2), moments of order 1/2 and 1 should be used, whereas moments of order 2 and 4 can be exploited only when the shape parameter is greater than 1.
As available estimators cited above cannot ensure good estimates of the parameters when the clutter data are correlated, in this work we propose the implementation of a novel multiheaded based estimator of the clutter parameters, including the CNN, stacked LSTM, CNNLSTM, bidirectional LSTM, and LSTM Auto encoder architectures, to provide better estimates of Pareto type II clutter parameters. After the training task of the proposed approach using synthetic sea clutter data, the MSE metric test is improved against the changes of the correlation coefficient. Since the GLRTLTD detector presented in [34] necessitates the knowledge of the clutter parameters, the application of the deep learningbased estimator plays two roles: maximization of the detection probability owing the best estimates of the parameters and improvement of the processing time due to the forward computing of the estimates.
In this work, the problem of estimating clutter parameters in the case of correlated data is addressed using a deep learning (DL) multiheaded architecture approach, and an offline training of the artificial neural networks (ANN) is carried out by using several combinations of the clutter parameters, with different correlation degrees for generalization purposes. To assess the performance of the proposed estimator, comparisons with the conventional approaches are carried out using generated correlated clutter data. The proposed deep learning estimator demonstrates its superior performance over existing approaches in terms of estimation mean square error (MSE) and robustness to changes of the clutter correlation coefficient.
The rest of the paper is structured as follows: Section 2 deals with the problem formulation of estimating parameters in correlated compoundGaussian environments. In the case of uncorrelated samples, classical estimators of the multivariate Pareto type II clutter parameters are presented in Sect. 3. A more indepth discussion of the appropriate deep learning models for the prediction scheme is reported in Sect. 4, along with the validation of the proposed estimator. In Sect. 5, the performance of the proposed estimator is compared with existing approaches against correlated clutter data. Finally, in Sect. 6, conclusions are drawn and further work is proposed.
2 Problem statement
Estimating parameters of the clutter model can be handled, conceptually, as a manytomany sequence prediction where the input sequence is formed by values of the clutter returns and the output vector contains the estimates of the shape and the scale parameters of the clutter model. To this end, a prediction model is trained on a set of training sequences via deep learning algorithms.
After it has been trained, the model uses a forward calculation which is equivalent to closed form estimators to estimate the unknown parameters.
Let us assume coherent pulses echoes from a radar transmitter, then the m complex samples from the scalar time series \(\left\{ {z\left[ m \right], m = 1,2, \ldots } \right\}\) can be assembled into an mdimensional vector \({\varvec{z}} = \left[ {z\left[ 1 \right],z\left[ 2 \right], \ldots ,z\left[ m \right]} \right]^{T}\) where T denotes the transpose operator. The detection problem is considered as a binary hypothesis testing on a complex signal containing the inphase and quadrature components such that:
Under the null hypothesis, the measured vector z contains only clutter returns c, i.e.,\({\varvec{z}}{\text{H}}_{0} = \user2{c }\). For a compoundGaussian modeled clutter, the SIRV c consists of the product of two independent random variables such that:
where the speckle x is a complex Gaussian variate with zero mean, unit variance and represents the properties of the coherent radar sensor. The covariance matrix \(E\left\{ {{\varvec{xx}}^{H} } \right\} = {\mathbf{M}}\) is positive definite Hermitian, then, in shorthand \({\varvec{x}}{ }\sim { \mathcal{C}\mathcal{N}}\left( {0,{\mathbf{M}}} \right)\) and normalized such that \({\text{Tr}}\left\{ {\mathbf{M}} \right\} = m\), where \({\text{Tr}}\left\{ {\mathbf{M}} \right\}\) is the trace of M. The texture τ is a positive random quantity and represents the local power of the clutter in the cell under test (CUT). The average clutter power is given by E{τ}. We assume here that the texture and the speckle are statistically independent, and the clutter covariance matrix M is assumed to be perfectly known. Given a specific value of τ, \({\varvec{c}}\tau \sim {\mathcal{C}\mathcal{N}}\left( {0,\tau {\mathbf{M}}} \right)\), the multivariate mth order pdf under the null hypothesis can be derived:
· denotes the determinant of a matrix.
The pdf of zH_{0} is calculated by averaging \(p_{{{\varvec{z}}\tau ,{\varvec{H}}_{0} }} \left( {{\varvec{z}}\tau ,{\varvec{H}}_{0} } \right)\) over τ as:
The texture τ is modeled by an inversegamma pdf:
where λ > 0 is the shape parameter, η > 0 is the scale parameter and Γ(·) is the Gamma function. With \(b = \frac{\eta }{\lambda }\), (5) becomes:
From [34], the expression of the GLRTLTD is given by:
The righthand side of (7) is the adaptive threshold which is formulated in terms of T, m, λ, \(\eta\) and \(q_{0}\), where \(q_{0} \left( {\varvec{z}} \right) = {\varvec{z}}^{H} {\varvec{M}}^{  1} {\varvec{z}}\) is obtained from the m correlated data, \(q_{1} \left( {\varvec{z}} \right) = q_{0} \left( {{\varvec{z}}  {\varvec{s}}} \right)\), \({\varvec{s}} = a{\varvec{p}}\) is the target vector, α is the unknown complex amplitude and p is the steering vector. T is the threshold multiplier which is computed from the desired probability of false alarm P_{FA}. The P_{FA} of the detector (7) is the following [34]
The GLRTLTD algorithm depends on the clutter parameters. Replacing the unknown values of the clutter parameters λ and η with their estimates leads to what is called the Adaptive GLRTLTD detector. For a realistic case, we need to collect many samples to estimate the clutter parameters.
In the literature, there are no estimators that consider explicitly data correlation. Most existing estimators of clutter parameters were derived under the assumption of independent and identically distributed (iid). If iid Pareto type II clutter is assumed, methods that employ the maximum likelihood principle or the method of moments are found in [35]. Those estimators are implemented in this work to identify the impact of the correlation coefficient on the estimation accuracy and a novel approach based on a trained deep learning predictor is proposed to estimate the shape and scale parameters of correlated Pareto type II model. As shown in Fig. 1, this approach covers two stages: an offline training process for estimation purposes and validation stage of the estimator. The offline part consists of generating massive training data using different couples of {λ, η} and train the deep neural network as a supervised many to many sequence prediction problems. In addition, the correlation coefficient ρ is also considered in the adaptive scheme of the Deep learning approach for a generalization purpose. After validating the DLbased estimator, the discrimination of the sea clutter with different shape parameters from objects of interest is carried out by replacing the real parameter values with their estimates in the GLRTLTD test statistic.
An overview of the standard estimators of the shape and scale parameter of the Pareto type II distribution is presented in the next section.
3 Parameter estimation of multivariate Pareto type II clutter model
In this section, four estimation methods of Pareto type II clutter parameters are presented named IOM (IntegerOrder Moments), NIOM (NonIntegerOrder Moments), [zlog(z)], and MLE (Maximum Likelihood Estimator). Let y_{1}, y_{2}, …, y_{N} be a random sequence for the Pareto type II clutter given at the output of the envelope detector [36], with
Moments of order k should be determined to estimate the two unknown parameters λ and η from the data using IOM and NIOM techniques.
which converges only if \(\lambda > k/2\).
3.1 Integerorder moments (IOM) method
From (10), the texture parameters, i.e., λ and η, can be obtained by equating the analytical expectations of moments of order 2 and 4 to their sample counterparts, i.e., \(\hat{\mu }_{k} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} y_{i}^{k}\). Hence,
Recall that the constraint λ > k/2, for small values of λ, estimation of (1) is unachievable via (11). Thus, it is suggested to use FOM method for values of k which are not necessary integers [36].
Degraded estimation performances are observed for low sample sizes in spiky clutter situations.
3.2 Fractional order moments (FOM) method
Moments of order 1 and 1/2 are chosen to estimate numerically the underlying parameters. Hence,
Solving the above equations simultaneously and after some algebraic operations we get
Numerical routines are required to compute λ and η leading to a slow convergence time.
3.3 [zlog(z)] method
It has been shown that the estimates obtained by this approach are extremely close to the MLE [35]. Here, we need to determine first the expectations E{log(y^{2})} and E{y^{2}log(y^{2})} [37]. The [zlog(z)] estimator is given in a closed form provided by
This approach relatively reduces the estimation time with a slight degradation in estimation performance compared to the MLE method.
3.4 MLE method
The vector of parameters \(\hat{\theta } = \left[ {{\hat{\lambda }},\hat{\eta }} \right]\) which maximizes the loglikelihood function L(θ), i.e. \(\hat{\theta }\) is the solution of the following equation
From the definition of the MLE method, the likelihood function is computed first on the basis of (9) and partial derivatives with respect to λ and η are required to find the maximum of the loglikelihood function. From [36], it is shown that the ML estimate of \(\hat{b} = \frac{{\hat{\lambda }}}{{\hat{\eta }}}\) is a solution of the following function.
From (17), the ML estimate of \(\hat{\lambda }\) is given in a closed form as a function of \(\hat{\eta }\)
where the scale parameter is simply determined by \(\hat{\eta } = \hat{b}\hat{\lambda }\).
This is a timeconsuming approach because of the optimization of nonlinear equation for the scale parameter.
4 Multiheaded deep learning estimator
In recent years, different machine learning paradigms have received extensive attention as a multidisciplinary subject. Deep learning [38], a subfield of machine learning, belongs to the class of representation learning aiming to construct models and representations from large unprocessed data. The particularity of deep learning resides in its ability to learn hierarchical feature representations directly from the raw data and thus reducing the handcrafted features extraction considerably. Deep learning has presented superhuman skills in many applications, including, and not limited to: object detection/recognition, anomaly detection, time series forecasting, and natural language processing. Since we are tackling the issue of estimating clutter parameters and the clutter returns are 1dimensional signals, one can shrink the estimation process to a regression problem as a manytomany sequence prediction. We will expose the most popular deep learning models dedicated to sequential data problems in the following subsection.
4.1 Holistic view of appropriate DL models for sequential data
Given the variety of sequential or time series problems across diverse domains, several neural network approaches have emerged [39, 40]. Among these models, recurrent neural networks or RNNs for short have gained great interest owing to the shortterm memory obtained from the recurrent feedback connections and provide a powerful tool for dealing with correlated sequential data. However, traditional RNNs suffer from two main limitations. First, the temporal order involved in the input sequence affects the model's outputs which is primarily based on the previous context. For this, an appropriate solution proposed in [41] consists of presenting each training sequence forwards and backwards to two recurrent networks connected to the same output layer. The second drawback is the difficulty of learning longterm dependencies that are encoded in the data because of the vanishing gradient [42]. To deal with this, an alternative recursive architecture with a specialized cell structure was introduced, leading to long shortterm memory network (LSTM) [43]. As the length of the input variable changes in the sequencetosequence prediction applications, an RNN EncoderDecoder architecture was developed in [44] and based on one RNN network acting as an encoder which maps the variablelength source sequence to a fixedlength vector and a second RNN network is used to map the vector representation back to a variablelength target sequence. Besides, incorporating LSTM cells instead of RNNs in the EncoderDecoder design leads to EncoderDecoder LSTM architecture [45]. Moreover, Autoencoder (AE), an unsupervised learning model trained using supervised learning methods, also called a selfsupervised learning method, is a special type of neural network that seeks to learn a compact representation of a set of data intended as a feature vector to another supervised learning model. A variety of AE models are proposed in the literature and are principally used for anomaly detection, feature extraction, and dimensionality reduction [46]. The implementation of the AE using the EncoderDecoder LSTM yield the LSTMAutoencoder (LSTMAE). The LSTMAE may learn a compressed version of sequential data [47].
Furthermore, a tremendous interest in another deep learning architecture alternating convolutional and subsampling layer has been addressed in recent years. The convolutional neural networks (CNNs), which are feedforward artificial neural networks with many hidden layers trained with a massive size of labeled datasets, have demonstrated to be a powerful tool for numerous engineering applications, especially for 2D signals. Recently, 1D CNNs have been proposed to deal with the time series forecasting and classification concerns and have immediately achieved extraordinary accuracy performance, making them a viable option for 1D signal processing applications [39].
A more indepth explanation of the different architectures is addressed in the following subsections.
4.2 Long shortterm memory
Recurrent neural networks take their appellation from the involved recurrence connections, which provide them with a memory of past activations allowing the temporal dynamics learning and making this kind of recursive networks appropriate for modeling sequential data. RNNs are defined as a topology stimulating a discretetime dynamical system with an input x^{t}, a hidden state h^{t} and an output y^{t}. The drawback of the conventional RNNs is the exploding or vanishing gradient, which occurs when the gradient tends to be extremely small, leading to the shortterm memory problem in long sequences. The long shortterm memory architecture (LSTM) is proposed as an efficient attempt to solve this problem. The LSTM is constituted of three states to control the interactions between the different memory units. These states are referred to as the input, forget and output gates. Specifically, the forget gate removes redundant or irrelevant data and chooses to retain or ignore its previous status. Whereas the input state analyzes the new data and checks whether the input signal may alter the memory cell state, the output gate, conversely, handles the input data with the cell state [43]. In other words, the idea is to have an input vector added with the previous output vector passed through a neural network which outputs the values to keep with 1 and the values to forget with 0.
The block diagram of Fig. 2 illustrates the different components of a cell memory of the LSTM network.
Unlike the unidimensional, bidirectional LSTMs process the data in both forward (positive time direction) and backward (negative time direction) senses. The idea is to duplicate the first recurrent layer in the network into two sidebyside layers, then providing the input sequence asis to the first layer, and the second layer processes a reversed copy of the input [48].
It is worth noting that timesteps in the input sequence are treated one at a time the same way as the unidirectional architecture. The difference is that the network steps through the input sequence are processed in both directions at the same time.
4.2.1 LSTMautoencoders
Autoencoders (AEs) are an elegant variation of neural networks trained in an unsupervised learning paradigm to reconstruct an informative representation of its input and often involved in different implications such as clustering or used as generative models. AE consists of three layers arrangement, based on feedforward neural networks, for encoding and decoding purposes. The encoding phase comprises the mapping of the dataset to a hidden layer and learns a compressed representation of the data sequence. In contrast, the decoding task covers the reconstruction of the input sequence from the latent variables of the encoding phase. The encoding and decoding processes can be formulated as follows:
where x is the input sequence, h(·) and f(·) are the encoding and decoding functions, respectively, h(x) is the hidden encoded vector and \(\hat{x}\) is the reconstructed vector of the output layer. Additionally, W_{1}, W_{2} are the encoder and decoder weight matrix while b_{1}, b_{2} represents the bias vectors of each phase [49].
During the training, the AE model tries to minimize an objective function which is nothing but the reconstruction error expressed in terms of the difference between the input and the reconstructed output (i.e., \({\text{minimize}}\left {\left {x  \hat{x}} \right} \right^{2}\)). One way to get lower reconstruction error is to stack different AE layers, and the obtained architecture is referred to as stacked autoencoder (SAE). SAE allows the extraction of useful highlevel features and other advantages of abstraction and invariance, and thereby better generalization skills of the model are expected [49].
Since RNNs are well suited for modeling sequential data compared to the traditional feedforward multiple layer perceptron (MLP) involved in the AE architecture, numerous LSTMbased autoencoder (LSTMAE) networks were proposed where the classical MLP in AE are replaced by LSTM cell organized in encoderdecoder architecture as shown in Fig. 3.
LSTMAE is the combination of two LSTM layers acting as encoder and decoder. The LSTM encoder layer is trained to produce the representation vector, and the latter is fed to the second LSTM decoder layer that should reconstruct the input data. The LSTMAE cost function is the mean square error between the original input sequence and the reconstructed vector. The LSTMAE can be applied either for input sequence reconstruction and/or prediction purposes [47, 49]. Regardless the desired task, once the AE has been trained, the decoder can be removed, and the encoder can be kept for other supervised learning processes as a standalone model. In [49], Stacked LSTMAE (LSTMSAE) was proposed for modeling multistep time series forecasting problems as a robust forecasting module that converts time series observation into representative features which can be used for other tasks such as prediction or analysis. The implemented LSTMSAE adopts a shallow architecture with one LSTM for the encoder and decoder layers. The pretraining phase consists of building three LSTMSAE blocks where each block is constituted of the past trained and saved LSTM encoder as a hidden layer. A finetuning phase is then carried out by adding an output layer for a supervised regression problem where the label is the corresponding variable to the input sequence. The evaluation is then completed using outofsample testing data. This is the same approach we used in this paper, with a minor modification in the number of blocks.
4.3 Convolutional neural networks (CNNs)
Convolutional neural networks are a particular type of feedforward neural networks that map features from input data in a hierarchical way and used for data processing. They consist of a chain of layers alternating convolutional and subsampling operations to the input data, followed by a classifier. Besides the immunity to small data transformations and the ability to process large inputs, CNNs combine the feature extraction and the classification tasks in the same process breaking all records in computer vision and many other domains. CNNs were first fashioned specifically to process 2D signals, and their applications for 1D scarce data require a suitable 2D to 1D conversion and various feature transformations were introduced [39]. However, due to the high computational complexity along with a large amount of data necessary for the training, 2D CNNs may not be an adequate solution for many 1D data applications. A compact 1D CNN architecture was recently proposed in [39] to deal with the drawbacks of 2D CNNs in 1D applications and achieve highperformance levels in diverse signal processing applications.
CNNs work the same way whether inputs have 1, 2, or 3 dimensions. The difference is the structure of the input data and how the convolution kernel moves across the data. The input layer of the 1D CNN receives a 1D time series sequence instead of 2 or 3D data, and 1D filters are replaced instead of 2D filters. The convolutional layer performs a sequence of convolutions, which are simply linear weighted sums of 1D sequences passed through the activation function followed by the maxpooling or other pooling operation. Consequently, the network processes 1D signals instead of 2D matrices for both kernels and feature maps, resulting in a low computational complexity. Related explanations of the Forward and Backpropagation learning algorithms in CNN layers are provided in [39].
Considering the various deep architectures seen in this section, it should be possible to conceive an intelligent estimator for Pareto modeled clutter using a mixture of CNN, LSTM, BLSTM and Autoencoder designs.
The following subsection of this paper deals with the design and the validation of a multiheaded deep learningbased estimator.
4.4 The multiheaded deep learningbased estimator
4.4.1 Data generation/preprocessing
The performance accuracy of deep learning methods relies highly on the quality and the amount of the available labelled dataset. DL algorithm inputs could be of various types, and the most commonly used data inputs are in the form of tensors. For the problem of estimating the Pareto II clutter parameters, a combination of random values of the correlation coefficient, scale, and shape parameters together with data of normalized clutter power were considered for generalization purposes.
The data generation of correlated samples is performed using SIRV model as follows:

Setting the following parameters:
the number of pulses N, window size m, the number of Monte Carlo trials n, and the number of vectors nb;

Generation of random coefficient of correlation ρ between 0 and 1;

Generation of random values of the shape parameter λ in the range of [1.1, 8] and the scale parameter η between [1, 12];

For i = 1: nb

Repeat for each value of ρ

Repeat for each value of the couple {λ, η};

Compute the covariance matrix of the speckle: \(M_{x} = toepliz\left( {\rho^{m} } \right)\) and the Cholesky factor of \(M_{x} {:}\,L = chol\left( {M_{x} } \right)\);

Generate the SIRV Pareto data as follows:
– Compute the complex speckle x
$$x = L*\left( {randn\left( {m,N} \right) + j*randn\left( {m,N} \right)} \right)/\sqrt 2$$– Compute the Inverse Gaussian texture τ as \(\tau = 1/gamrnd\left( {\lambda ,\eta ,1,N} \right)\)



The SIRV c is obtained by Eq. (2) (i.e., \(c = \sqrt \tau x\)) which is a matrix (N*m).

Calculate the power of c so that the input matrix contains real values.

Flatten the matrix c and append λ, η (i.e.,\(z_{i} = \left\{ {c_{i,1} , c_{i,2} , \ldots , c_{{i,\left( {N*m} \right)}} , \lambda_{{i,\left( {N*m} \right) + 1}} ,\eta_{{i,\left( {N*m} \right) + 2}} } \right\}\)).

Concatenate the current vector with the previously generated vectors in the same matrix.

Save the matrix
$$\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {c_{1,1} c_{1,2} } \\ {c_{2,1} c_{2,2} } \\ \end{array} } & \cdots & {\begin{array}{*{20}c} {c_{{1,\left( {m*N} \right)}} \lambda_{{1,\left( {N*m} \right) + 1}} \eta_{{1,\left( {N*m} \right) + 2}} } \\ {c_{{2,\left( {N*m} \right)}} \lambda_{{2,\left( {N*m} \right) + 1}} \eta_{{2,\left( {N*m} \right) + 2}} } \\ \end{array} } \\ \vdots & \ddots & \vdots \\ {c_{nb,1} c_{nb,2} } & \cdots & {c_{nb,N*m} \lambda_{{nb,\left( {N*m} \right) + 1 }} \eta_{{nb,\left( {N*m} \right) + 2}} } \\ \end{array} } \right]$$
4.4.2 Estimation architecture design and validation
The basic idea for constructing a multiheaded deep learningbased estimator is to present a combination of different features to the deep neural network with varying designs of architecture, and the role of each block is either to model/extract valuable features and/or to predict values from the original dataset. The proposed approach consists of building four deep architectures, namely an LSTMSAE, 1D CNN, BLSTM, and 1D CNNLSTM models. Then, the model's outputs are concatenated to form the input for the final prediction block, which consists of a single LSTM layer followed by an MLP to produce final estimates of the shape and scale Pareto type II clutter, as illustrated in Fig. 4.
The two first heads of the estimator are mainly developed to produce a new representation and extract new features from the input sequences. The LSTMSAE is composed of two hidden layers of encoded data following a similar approach of [49] and trained in a selfsupervised manner where the desired output is the same as the expected input sequence. The first LSTMAE block in the stack is trained, and its LSTM encoder layer is saved to be used as input for the second LSTMAE block in the stack, which is also saved and load as a second hidden layer for the final LSTMSAE model. The LSTM encoderdecoder architecture is designed using 64 LSTM units for both encoding and decoding, and the RepeatVector layer is used as an adapter to fit the encoder and decoder of the network. The output of that architecture is a reconstructed version of the generated Pareto type II samples.
The second head is the convolutional architecture constituted of two consecutive 1Dconvolutional layers where each ConvLayer consists of 64 filters, followed by a MaxPooling operation. The principal task achieved by this head is to extract useful features. The output is then flattened to be concatenated with the representation vector from the LSTMSAE.
The two last heads are designed for prediction purposes. To this end, hybrid 1D CNNLSTM architecture and a BLSTM network are considered. The bidirectional LSTM model is formed using 128 LSTM cells trained forward and backward to output intermediate estimates of the shape and scale parameters of the Pareto model. The last head is a combined architecture of 1D CNN and LSTM networks. It consists of two consecutive convolutional layers of 32 filters in each ConvLayer, followed by a MaxPooling. The flattened output of the last convolutional layer is fed to the input nodes of a hidden LSTM layer composed of 32 cell units. The output layer is MLP which is incorporated to provide another intermediate predicted value of the unknown Pareto distribution parameters.
Finally, the outputs of the abovementioned models are concatenated with the original set of data and fed to an LSTM Layer composed of 64 units followed by two dense layers of 128 and 2 neurons, respectively, to provide the final estimates of the shape and scale parameters. It is worth noting that the ReLU function is the activation function for all the models mentioned above.
The dataset is generated following the process previously described and consists of 250,000 vectors; each vector contains N*m simulated samples associated to the values of λ and η as targets. The dataset is randomly organized into a training set (80%) and a validation set (20%).
The final multiheaded deep learningbased estimator is obtained after different trials and errors process. Different architectures were considered with diverse models, layers, activation functions, and optimizers. The elaboration of the proposed intelligent estimator was validated after the comparison with several deep learning models. Table 1 summarizes the overall mean square error (MSE) in the training and validation phases along with the configuration of the different architecture designs. Dropout and batch normalization layers were incorporated for the above models to increase the generalization and improve prediction performances.
As depicted in Table 1, it is seen that the proposed DLbased estimator exhibits better performance than the other topologies for both the training and the validation datasets. The Adam optimizers exhibit the best performance in reason of its convergence rapidity along with the RMSprop with nearoptimal performance.
5 Performance assessment
With regards to the Deep Learningbased estimator presented above, we proceed in this section to carry out numerical simulations to evaluate the effectiveness of the proposed estimation method and confirm its applicability. For this purpose, we compare the results obtained using the multiheaded deep learning estimator against the existing FOME, [zlog(z)], and the MLE methods by means of the Mean Square Error (MSE) metric test.
Besides the fact that the moment orders vary depending on the clutter spikiness, the applicability hypothesis of the integerorder moment estimator in realtime scenarios is disproved due to the large number of pulses required for accurate estimation of the clutter parameter. That is why we omit the comparison of the proposed Deep learning estimator with the integerorder moment method described in Sect. 3.
The dataset used for the test task is constructed following the same approach used for the training database. The window size is set to m = 24 with N = 4, 8 and 16 integrated pulses, and we considered different values of the correlation coefficient, from data almost uncorrelated to strongly correlated (ρ = 0.01, 0.1, 0.3, 0.6, 0.9, 0.99) to see the effect of the correlation on the estimation task. Pareto samples are generated in the range of values for λ between [1.1–6] so that different states of the clutter spikiness are included. In the overall test phase, the estimates of λ are averaged over n = 1000 Monte Carlo trails, and the computations were executed on an Intel® Core™ i53230 M CPU @ 2.60 GHz RAM 8.0 GB.
As previously stated, for each value of λ between [1.1, 6], we generate an m by N matrix with a fixed correlation coefficient. After that, the matrix is flattened to produce a single sample vector. This process is repeated 1000 times for each value of the couple (λ, η), resulting in a 50,000 × 384 matrix in the test dataset.
For low correlation coefficient (ρ = 0.1), Fig. 5 illustrates the MSE curves of the FOME, [zlog(z)], MLE, and the DLbased estimator for a window size of 24 and 16 integrated pulses. For very spiky clutter, comparable results are obtained using the four estimators with slightly enhanced performance of the DLbased estimator for λ ≤ 1.3. As the spikiness of the clutter decreases, the proposed method exhibits improved performance compared to the other estimators with instabilities of the MLE and its closedform counterpart [zlog(z)] estimators. For the parameter η, it is clearly seen that the multiheaded deep learning attains the lowest error for η < 8 with overlapped curves of the other classical estimators. For large values of η, the MLE gives equivalent results as the DL and [zlog(z)] estimators.
As the correlation coefficient increase (ρ = 0.3), the estimation error of the [zlog(z)], MLE, and the FOME increases while the DL estimator maintains its superiority for high values of the shape parameter as depicted from Fig. 6 with a slight superiority of the MLE for λ = 1.4. Note that for ρ = 0.3 and large values of η, similar results to those obtained for ρ = 0.1 are observed, and globally, precise estimation accuracy is achieved using the proposed estimation approach. For ρ = 0.6, as shown in Fig. 7 that the proposed estimator achieves improved performance for the overall range of the shape and scale parameters. For highly correlated data (ρ = 0.9), the estimation accuracy of the multiheaded deep learningbased approach exhibits better performance than the classical approaches as illustrated in Fig. 8. For a more informative illustration, Figs. 9 and 10 show the bias of the shape and scale parameters for ρ = 0.1 and ρ = 0.9, respectively. As expected, the results obtained from the bias criterion appears to be identical to those obtained by the MSE. Regardless of the correlation coefficient, for large shape parameter values, the DL estimator performance exceeds the log and momentsbased estimator. On the other hand, the bias confirms the superiority of the DL approach in estimating the scale parameter in the overall studied cases.
It is worth noticing that the Multiheaded deep learning approach estimates both the shape and scale parameters separately but simultaneously, while the estimation of the scale parameter for the [zlog(z)] and FOME depends on the estimate of the shape parameter. Conversely, for the case of the MLE, the shape parameter is estimated from the scale parameter. Thus, it is important to compare the results of the proposed approach against those obtained using the moment of order 1/2 (see Eq. 13). Figure 11 shows improved results of the multiheaded DL estimator for η > 1.8 for uncorrelated data. On the other hand, for correlated pulses, Fig. 12 illustrates the effectiveness of the proposed approach to estimate the scale parameter for the overall range of η values. Consequently, the estimation of one parameter from another using the existing approaches produces imprecise results, leading to a considerable error impeding the application of such estimators in real radar scenarios.
In order to visualize the impact of the correlation on the estimation, MSE curves of the scale and shape estimators with ρ = 0.01 and ρ = 0.99 are plotted together in the same figure. For m = 24 and N = 16, we clearly see in Fig. 13 that, for low values of the shape parameter and uncorrelated data, better estimation performance is achieved for the conventional approaches. In contrast, for the correlated data case with high values of λ, degraded performance with constated instability MSE curves are depicted.
Here, the DLbased estimator attains the lowest MSE values and ensures the stability and the accuracy for both cases with enhanced results for the uncorrelated data.
With reduced number of pulses (M = 8, 4), the estimation performance degrades for the overall estimators compared to the results obtained for large number of pulses as depicted in Fig. 14 and 15. The proposed approach keeps its superiority for large and small number of pulses; then, it can be applied in various situations (Fig. 15).
Concerning the computation complexity of the algorithms presented in this work, the MLE and FOME approaches require the use of an optimization algorithm to produce estimates of the unknown parameters which introduce a heavy computational burden to the estimation task and limit their realtime application. In contrast, the proposed approach is trained offline leading to an improved execution time for the tested data. The increase in the number of neurons and layers of the deep learning approach results in the increase in the neural network parameters to be processed and somehow, reduces the speed of the algorithm execution. The [zlog(z)] closedform estimator eliminates the optimization process resulting in a faster execution time as depicted in Table 2.
Nonetheless, for correlated echoes, the [zlog(z)] method exhibits degraded performance, which again limit its applicability. The compromise between the time execution and the accuracy performance of the proposed intelligent algorithm confirms its superiority against the conventional FOME, [zlog(z)] and MLE methods.
The estimation accuracy depends highly on the clutter spikiness, the number of pulses, and the correlation between the samples. From the results obtained previously, it is worth stating that the classical FOME, [zlog(z)] and MLE approaches are well suited for spiky uncorrelated echoes and degraded performance is observed for strongly correlated data with high values of the shape parameter. The particularity of the proposed multiheaded deep learningbased estimator resides in its ability to estimate accurately and simultaneously the shape and the scale parameters for both correlated and uncorrelated data with reduced computational complexity.
6 Conclusions and further works
This paper focuses on the estimation of the shape and the scale parameters of Pareto Type II distributed clutter. The proposed approach relies on a mixture of supervised and semisupervised DL architectures. Simulated data were generated to investigate the performance of the proposed estimator and compare its performance with that of existing estimators. Numerical results show the superiority of the proposed multiheaded DLbased estimator over methods based on the method of moments and the maximum likelihood. Conventional FOME, [zlog(z)] and ML approaches are well suited for spiky uncorrelated clutter, but with increasing correlation coefficient and for high values of the shape parameter, these methods exhibit inaccurate estimates. The proposed multiheaded DLbased estimator has better performance with reduced execution time. The proposed estimator can be implemented in a CFAR detector in the presence of correlated nonGaussian clutter. The estimate of Pareto clutter parameters in the presence of nonnegligible additive white Gaussian noise (AWGN) is a problem that also can be explored by making use of DL approach.
Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
References
W.L. Melvin, J.A. Scheer, Principles of Modern Radar: Radar Applications, Vol. 3 (IET, 2013)
G. Lampropoulos, A. Drosopoulos, N. Rey et al., High resolution radar clutter statistics. IEEE Trans. Aerosp. Electron. Syst. 35(1), 43–60 (1999)
A. Farina, F. Gini, M. Greco, L. Verrazzani, High resolution sea clutter data: statistical analysis of recorded live data. IEE Proc. Radar Sonar Navig. 144(3), 121–130 (1997)
K.J. Sangston, K.R. Gerlach, Coherent detection of radar targets in a nonGaussian background. IEEE Trans. Aerosp. Electron. Syst. 30(2), 330–340 (1994)
K. Ward, C. Baker, S. Watts, Maritime surveillance radar. I. Radar scattering from the ocean surface, in IEE Proceedings FRadar and Signal Processing, vol. 137, pp. 51–62 (IET, 1990)
A. Farina, A. Russo, F. Scannapieco, S. Barbarossa, Theory of radar detection in coherent Weibull clutter, in IEE Proceedings F (Communications, Radar and Signal Processing), vol. 134, pp. 174–190 (IET, 1987)
B. Cantrell, Radar detection in nonGaussian, correlated clutter. Nav. Res. Lab. Rep. 9015, 21 (1986)
A. Martinez, P. Swaszek, J. Thomas, Locally optimal detection in multivariate nonGaussian noise. IEEE Trans. Inf. Theory 30(6), 815–822 (1984)
R. Tough, K. Ward, The correlation properties of gamma and other nonGaussian processes generated by memoryless nonlinear transformation. J. Phys. D Appl. Phys. 32(23), 3075 (1999)
E. Conte, M. Longo, Characterisation of radar clutter as a spherically invariant random process, in IEE Proceedings FCommunications, Radar and Signal Processing, vol. 134, pp. 191–197 (IET, 1987)
E. Conte, M. DiBisceglie, M. Longo, M. Lops, Canonical detection in spherically invariant noise. IEEE Trans. Commun. 43(2/3/4), 347–353 (1995)
E. Jakeman, P. Pusey, A model for nonRayleigh sea echo. IEEE Trans. Antennas Propag. 24(6), 806–814 (1976)
S. Bocquet, L. Rosenberg, C.H. Gierull, Parameter estimation for a compound radar clutter model with trimodal discrete texture. IEEE Trans. Geosci. Remote Sens. 58(10), 7062–7073 (2020)
D.R. Iskander, A.M. Zoubir, Estimation of the parameters of the kdistribution using higher order and fractional moments [radar clutter]. IEEE Trans. Aerosp. Electron. Syst. 35(4), 1453–1457 (1999)
I. Chalabi, A. Mezache, Estimators of compound gaussian clutter with lognormal texture. Remote Sens. Lett. 10(7), 709–716 (2019)
D. Blacknell, R. Tough, Parameter estimation for the kdistribution based on [zlog(z)]. IEE Proc. Radar Sonar Navig. 148(6), 309–312 (2001)
A. Gouri, A. Mezache, H. Oudira, Radar cfar detection in Weibull clutter based on zlog(z) estimator. Remote Sens. Lett. 11(6), 581–589 (2020)
R. Ravid, N. Levanon, Maximumlikelihood cfar for Weibull background, in IEE Proceedings FRadar and Signal Processing, vol. 139, pp. 256–264 (IET, 1992)
G.V. Weinberg, L. Bateman, P. Hayden, Constant false alarm rate detection in Pareto type ii clutter. Digit. Signal Process. 68, 192–198 (2017)
G.V. Weinberg, S.D. Howard, C. Tran, Bayesian framework for detector development in Pareto distributed clutter. IET Radar Sonar Navig. 13(9), 1548–1555 (2019)
S. Haykin, W. Stehwien, C. Deng, P. Weber, R. Mann, Classification of radar clutter in an air traffic control environment. Proc. IEEE 79(6), 742–772 (1991)
B. Yonel, E. Mason, B. Yazıcı, Deep learning for passive synthetic aperture radar. IEEE J. Sel. Top. Signal Process. 12(1), 90–103 (2017)
A. Coluccia, G. Ricci, Radar detection in kdistributed clutter plus thermal noise based on knn methods, in 2019 IEEE Radar Conference (RadarConf), pp. 1–5 (IEEE, 2019)
D. Roy, S. Srivastava, A. Kusupati, P. Jain, M. Varma, A. Arora, One size does not fit all: multiscale, cascaded rnns for radar classification. ACM Trans. Sens. Netw. 17(2), 1–27 (2021)
J. Zhao, R. Jiang, X. Wang, H. Gao, Robust CFAR detection for multiple targets in kdistributed sea clutter based on machine learning. Symmetry 11(12), 1482 (2019)
G. Wang, H. Ding, C. Wang, N. Liu, Estimation of sea clutter distribution parameters using deep neural network, in Artificial Intelligence in China, pp. 326–333 (Springer, Singapore 2020)
T.H. Kerbaa, A. Mezache, F. Gini, M.S. Greco, CNNLSTM based approach for parameter estimation of kclutter plus noise, in 2020 IEEE Radar Conference (RadarConf20), pp. 1–6 (IEEE, 2020)
E. Conte, M. Lops, G. Ricci, Adaptive detection schemes in compoundGaussian clutter. IEEE Trans. Aerosp. Electron. Syst. 34(4), 1058–1069 (1998)
F. Gini, A cumulantbased adaptive technique for coherent radar detection in a mixture of kdistributed clutter and Gaussian disturbance. IEEE Trans. Signal Process. 45(6), 1507–1519 (1997)
F. Gini, M. Greco, Suboptimum approach to adaptive coherent radar detection in compoundGaussian clutter. IEEE Trans. Aerosp. Electron. Syst. 35(3), 1095–1104 (1999)
F. Gini, M. Greco, M. Diani, L. Verrazzani, Performance analysis of two adaptive radar detectors against nonGaussian real sea clutter data. IEEE Trans. Aerosp. Electron. Syst. 36(4), 1429–1439 (2000)
F. Gini, M.V. Greco, A. Farina, Clairvoyant and adaptive signal detection in nonGaussian clutter: a datadependent threshold interpretation. IEEE Trans. Signal Process. 47(6), 1522–1531 (1999)
F. Gini, M. Greco, Covariance matrix estimation for cfar detection in correlated heavy tailed clutter. Signal Process. 82(12), 1847–1859 (2002)
K.J. Sangston, F. Gini, M.S. Greco, Coherent radar target detection in heavytailed compoundGaussian clutter. IEEE Trans. Aerosp. Electron. Syst. 48(1), 64–77 (2012)
S. Bocquet, Parameter estimation for Pareto and k distributed clutter with noise. IET Radar Sonar Navig. 9(1), 104–113 (2015)
A. Balleri, A. Nehorai, J. Wang, Maximum likelihood estimation for compoundGaussian clutter with inverse gamma texture. IEEE Trans. Aerosp. Electron. Syst. 43(2), 775–779 (2007)
S. Bocquet, Simulation of correlated Pareto distributed sea clutter, in 2013 International Conference on Radar, pp. 258–261 (IEEE, 2013)
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)
S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, D.J. Inman, 1d convolutional neural networks and applications: a survey. Mech. Syst. Signal Process. 151, 107398 (2021)
B. Lim, S. Zohren, Timeseries forecasting with deep learning: a survey. Philos. Trans. R. Soc. A 379(2194), 20200209 (2021)
P. Baldi, S. Brunak, P. Frasconi, G. Soda, G. Pollastri, Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15(11), 937–946 (1999)
S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, et al., Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies. A Field Guide to Dynamical Recurrent Neural Networks (IEEE Press, 2001)
S. Hochreiter, J. Schmidhuber, Long shortterm memory. Neural Comput. 9(8), 1735–1780 (1997)
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
I. Sutskever, O. Vinyals, Q.V. Le, Sequence to sequence learning with neural networks, in Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
P. Baldi, Autoencoders, unsupervised learning, and deep architectures, in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49 (JMLR Workshop and Conference Proceedings, 2012)
N. Srivastava, E. Mansimov, R. Salakhudinov, Unsupervised learning of video representations using lstms, in International Conference on Machine Learning, pp. 843–852 (PMLR, 2015)
A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
A. Sagheer, M. Kotb, Unsupervised pretraining of a deep lstmbased stacked autoencoder for multivariate time series forecasting problems. Sci. Rep. 9(1), 1–16 (2019)
Author information
Authors and Affiliations
Contributions
All authors have contributed toward this work as well as in compilation of this manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
MSG is the editorinchief of the journal at time of publication and was not involved in making a decision on the paper. All other authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kerbaa, T.H., Mezache, A., Gini, F. et al. Multiheaded deep learningbased estimator for correlatedSIRV Pareto type II distributed clutter. EURASIP J. Adv. Signal Process. 2023, 81 (2023). https://doi.org/10.1186/s13634023009828
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634023009828
Keywords
 Parameter estimation
 Pareto
 SIRV
 Correlation
 Deep learning
 CNN
 LSTM
 Autoencoder