 Research
 Open Access
 Published:
Multitask hidden Markov modeling of spectrogram feature from radar highresolution range profiles
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 86 (2012)
Abstract
In radar highresolution range profile (HRRP)based statistical target recognition, one of the most challenging task is the feature extraction. This article utilizes spectrogram feature of HRRP data for improving the recognition performance, of which the spectrogram is a twodimensional feature providing the variation of frequency domain feature with time domain feature. And then, a new radar HRRP target recognition method is presented via a truncated stickbreaking hidden Markov model (TSBHMM). Moreover, multitask learning (MTL) is employed, from which a full posterior distribution on the numbers of states associated with the targets can be inferred and the targetdependent states information are shared among multiple targetaspect frames of each target. The framework of TSBHMM allows efficient variational Bayesian inference, of interest for largescale problem. Experimental results for measured data show that the spectrogram feature has significant advantages over the time domain sample in both the recognition and rejection performance, and MTL provides a better recognition performance.
1. Introduction
A highresolution range profile (HRRP) is the amplitude of coherent summations of the complex time return from target scatterers in each range resolution cell, which represents the projection of the complex returned from the target scatting centers onto the lineofsight (LOS), as shown in Figure 1. Since it contains the target structure signatures, such as target size and scatterer distribution, radar HRRP target recognition has received intensive attention from the radar automatic target recognition (RATR) community [1–16].
Several studies [8–16] show that statistical recognition is an efficient method for RATR. Figure 2 shows a typical flow chart of radar HRRP statistical recognition. By statistical recognition is meant the feature vector y extracted from test HRRP sample x will be assigned to the class with maximum posterior probability p(cy), where c ∈ {1, ..., C} denotes the class membership. According to Bayes algorithm, p(cy) ∝ p(yc)p(c), where p(yc) is the classconditional likelihood and p(c) is the prior class probability. Since the prior class probability is usually assumed to be uniformly distributed, estimation of the posterior probability p(cy) of each class is turned into estimation of the classconditional likelihood p(yc) of each class. There are usually two stages (training and classification) in the statistical recognition procedure. We suppose the classconditional likelihood p(yc) can be described via a model with a set of parameters (i.e., a parametric model). In the training phase, these parameters are estimated via training data (known as statistical modeling); and in the classification phase, as discussed above, given a test sample x, we first extract the feature vector y from the test sample, then we calculate the classconditional likelihood p(yc) for each target c, finally, the test sample is associated with target c' if ${c}^{\prime}=arg\underset{c}{max}p\left(c\mathbf{y}\right)$. The focus of this article is on the feature extraction and statistical modeling.
Feature extraction from HRRP is a key step of our recognition system. One of the general feature extraction methods is feature dimensionality reduction [5]. This method is generally supervised, and some discriminative information may be lost during the dimensional reduction procedure. Another general feature extraction method is feature transformation. The study given in [6] investigates various higherorder spectra, and shows that the power spectrum has the best recognition performance. Rather than just utilizing the frequency domain feature as in [6], this study exploits the spectrogram feature of HRRP data for combining both time domain and frequency domain features, which is a twodimensional feature providing the variation of frequency domain feature with time domain feature. Some statistical models [8–16] are developed for HRRPbased RATR, of which [14–16] successfully utilized the hidden Markov model (HMM) for modeling the feature vectors from the HRRP sequence. Since the HRRP sample is a typical highdimensional distributed signal, it is computationally prohibitive to build such a highdimensional HMM for describing HRRP sequences directly. Therefore, to avoid this problem, some dimensionality reduction methods are utilized. For example, in [14, 15], the relax algorithm is employed to extract the waveform constituents from the HRRP radar echoes, and then an HMM is utilized to characterize the features; in [16], a nonstationary HMM is utilized to characterize the following two features, i.e., the location information feature of scattering centers which are extracted via the multirelax algorithm, and the moments of HRRP radar echoes. Nevertheless, some information contained in HRRP samples will be inevitably lost during the dimensional reduction procedure. Moreover, since multiple aspectdependent looks at a single target are utilized in the classification phase, the angular velocity of the target relative to the LOS of the radar is required to remain the same for the training and classification phases, which can hardly be satisfied for a noncooperative target.
The study reported here seeks an alternative way of exploiting HMM, in which we characterize the spectrogram feature from a single HRRP sample via the hidden Markov structure. In our model, the spectrogram feature extracted from each single HRRP is viewed as a ddimensional sequence (d is the length of spectrogram feature in the frequency dimension), thus only a single HRRP sample is required for the classification phase rather than an aspectdependent sequence. The main contribution of this study can be summarized as follows.

(a)
Spectrogram feature: The time domain HRRP samples only characterize the time domain feature of the target, which is too simple to obtain good performance. By contrast, the spectrogram feature introduced in this article is a timefrequency representation of HRRP data. The physical meaning of the spectrogram feature extracted from an HRRP sample is that the spectrogram feature in each time bin characterizes the frequency domain property of a fragment of the target, which can reflect the scattering properties of different physical constructions. Therefore, the spectrogram feature should be a better choice for the recognition problem.

(b)
Nonparametric model selection via stickbreaking construction: In the context of target recognition using HMMs, a key issue is to develop a methodology for defining an appropriate set of states to avoid over or underfitting. A Bayesian nonparametric infinite HMM (iHMM) which constituted by the hierarchical Dirichlet process (HDP) has proven effective to infer the number of states in acoustic sensing scenarios [17, 18]. However, the lack of conjugacy between the twolevel HDP means that a truly variational Bayesian (VB) solution is difficult for HDPHMM, which makes computationally prohibitive in large data problems. Recent study [19] proposes another way to constitute iHMM, where each row of transition matrix and initial state probability is given with a fully conjugate infinite dimensional stickbreaking prior, which can accommodate an infinite number of states, with the statistical property that only a subset of these states will be used with substantial probabilities, referred to as the stickbreaking HMM (SBHMM). We utilize the truncated version of such stickbreaking construction in our model to characterize the HMM states, which is referred to as the truncated stickbreaking HMM (TSBHMM).

(c)
Multitask learning (MTL): A limitation of statistical model is that it usually requires substantial training data, assumed to be similar to the data on which the model is tested. However, in radar target recognition problems, one may have limited training data. In addition, the test data may be obtained under different motion circumstance. Rather than building models for each data subset associated with different targetaspect individually, it is desirable to appropriately share the information among these related data, thus offering the potential to improve overall recognition performance. If the modeling of one data subset is termed one learning task, the learning of models for all tasks jointly is referred to as MTL [20]. We here extend the TSBHMM learning in a multitask setting for spectrogram featurebased radar HRRP recognition.

(d)
Full Bayesian inference: We present a fully conjugate Bayesian model structure, which does have an efficient VB solution.
The remainder of this article is organized as follows. We introduce spectrogram feature of HRRP data and analyze its advantages over time domain samples in Section 2. Section 3 briefly reviews the traditional HMMs. In Section 4, the proposed model construction is introduced, and the model learning and classification are implemented based on VB inference. We present experimental results on both singletask and multitask TSBHMM with time domain feature and spectrogram feature of measured HRRP data in Section 5. Finally, the conclusions are addressed in Section 6.
2. Spectrogram feature extraction
2.1 Definition of the spectrogram
The spectrogram analysis is a common signal processing procedure in spectral analysis and other fields. It is a view of a signal represented over both time and frequency domains, and has widely been used in the fields of radar signal processing, and speech processing [21, 22], etc.
Spectrograms can readily be created by calculating the shorttime Fourier transform (STFT) of the time signal. The STFT transform may be represented as
where x(u) is the signal to be transformed, w(·) is the window function.
The spectrogram is given by the squared magnitude of the STFT function of the signal:
From (1) and (2), we can see that the spectrogram function shows how the spectral density of signal varies with time.
In RATR problems, employing some nonlinear transformation (e.g., power transform metric) in feature domain may correct for the departures of samples from normal distribution to some extent and improve average recognition of learning models [2]. The power transform metric is defined as
where a is the power parameter.
2.2 Spectrogram feature of HRRP data
The sequential relationship across the range cells within a single HRRP echo can reflect the physical composition of the target. This can be illustrated by Figure 3, which presents the HRRP samples and corresponding spectrogram features from three plane targets, i.e., Yark42, An26, and Cessna Citation S/II.
The advantages of spectrogram are as follows: (i) HRRP scattering from complex targets are a strong function of the target sensor orientation; and even a slight variation of the targetaspect may yield the scatterers at the edges of the target moving across some range cells [23]. When the targetaspect changes a little, the scatterers within several continuous range cells (referred to as a chunk) are more robust than the scatterers in a single range cell. Therefore, the sequential relationship across the chunks in spectrogram of a single HRRP echo, rather than that across the range cells within a single HRRP, can reflect the target physical composition more robustly. (ii) Spectrogram is a timefrequency representation of a signal. It describes not only time domain feature, but also the spectral density varying with the time. (iii) At each discrete time (each chunk or each range cell), the observation of a spectrogram feature is a vector, while that of time domain feature (HRRP sample) is a scalar. Thus, the highdimensional feature vector may reflect more details than a single point for discrimination.
3. Review of traditional HMM: finite HMM and infinite HMM
The HMM [24] has widely been used in speech recognition and target recognition. It is a generative representation of sequential data with an underlying markovian process selecting statedependents distributions, from which observations are drawn. Specially for a sequence of length T, a state sequence s = (s_{1}, s_{2}, ..., s_{ T }) is drawn from P(s_{ t }s_{t1}). Given the observation model f(·), the observation sequence x = (x_{1}, x_{2}, ..., x_{ T }) can then be drawn as $f\left({\theta}_{{s}_{t}}\right)$, where ${\theta}_{{s}_{t}}$ is a set of parameters for the observation models which is indexed by the state at time t.
An HMM can be modeled as Φ = {w_{0}, w, θ}, each parameter is defined as
Given model parameters Φ, the probability of the complete data can be expressed as
And the data likelihood p(xΦ) can be obtained by integrating over the states using the forward algorithm [24]. In a classical HMM [24], the number of the states with the HMM is initialized and fixed. Therefore, we have to specify model structure before learning. However, in many practical applications, it needs an expansive model selection process to obtain a correct model structure. To avoid the model selection process, a fully nonparametric Bayesian approach with countably infinite state spaces is employed, first proposed by Beal, and termed infinite Markov model (iHMM) [25].
Recent study [19] proposes the iHMM with stickbreaking priors (SBHMM), which can be used to develop an HMM with an unknown number of states. In this model, each row of the infinite state transition matrix w is given a stickbreaking prior. The model is expressed as follows
where the mixture distribution G_{ i } has weights represented as w_{ i }= [w_{i, 1}, w_{i, 2}, ..., w_{ i, j }, ...], δ(θ_{ j }) is a point measure concentrated at θ_{ j }, Beta(1, β_{ i }) represents the Beta distribution with hidden variable β_{ i }, the drawn variables ${\left\{{v}_{i,j}\right\}}_{j=1}^{\infty}$ are independent and identically distributed (i.i.d), Ga(a_{ α }, b_{ α }) represents the Gamma distribution with preset parameters a_{ α }, b_{ α }, and H denotes a prior distribution from which the set ${\left\{{\theta}_{j}\right\}}_{j=1}^{\infty}$ is i.i.d drawn. The initial state probability mass function, w_{0}, is also constructed according to an infinite stickbreaking construction. When w_{ i }terminates at some finite number I  1 with ${w}_{i,I}\equiv 1{\sum}_{J=1}^{I1}{w}_{i,j}$, this result is a draw from a general Dirichlet distribution [26], which is denoted as w_{ i }~ GDD(1_{(I1) × 1}, [β_{ i }]_{(I1) × 1}), where 1_{(I1) × 1}denotes an (I  1)length vector of ones, [β_{ i }]_{(I1) × 1}is an (I  1)length vector of β_{ i }, and I represents the truncation number of states.
The key advantage of stickbreaking construction is that the corresponding statedependent parameters θ_{ j } are drawn separately, effectively detaching the construction θ from the construction initialstate probability w_{0} and state transition matrix w. This is contrast with HDP priors [27], where these matrices are linked with twolevel construction. Therefore, the stickbreaking construction makes fast variational inference feasible. In addition, the SBHMM has a good sparse property, which promotes a sparse utilization of the underlying states [19].
4. The SBHMM for HRRP data
4.1 MTL model construction
According to the scattering center model [4], for a highresolution radar system, a radar target does not appear as a "point target" any more, but consists of many scatterers distributed in some range cells along radar LOS. For a certain target, the scattering center model varies throughout the whole targetaspect. Therefore, preprocessing techniques should be applied to the raw HRRP data. In our previous work [11–13], we divide the HRRP into frames according to the aspectsectors without most scatterers' motion through resolution cells (MTRC), and use distinct parametric models for statistical characterization of each HRRP frame, which are referred to as the aspectframes and corresponding singletask learning (STL) models in our articles.
For the motivating HRRP recognition problems of interested here, we utilize TSBHMM for analyzing spectrogram features extracted from HRRP data. For a multiaspect HRRP sequence of target c (c ∈ {1, ..., C} with C denoting the number of targets here), we divide the data into M_{ c } aspect frames, e.g., the m th set (here m ∈ {1, ..., M_{ c }}) is ${\left\{{\mathbf{x}}^{\left(c,m,n\right)}\right\}}_{n=1}^{N}$ where N denotes the number of samples in the frame, and x^{(c, m, n)}= [x^{(c, m, n)}(1), ..., x^{(c, m, n)}(L_{ x })]^{T}represents the n th HRRP sample in the m th frame, with L_{ x } denoting the number of range cells in an HRRP sample. Each aspect frame corresponds to a small aspectsector avoiding scatters' MTRC [13], and the HRRP samples inside each targetaspect frame can be assumed to be i.i.d. We extract the spectrogram feature of each HRRP sample, and Y^{(c, m, n)}= [y^{(c, m, n)}(1), ..., y^{(c, m, n)}(L_{ y })] denotes the spectrogram feature of x^{(c, m, n)}as defined in (2) with L_{ y } denoting the number of time bins in spectrogram feature.
If learning a separate TSBHMM for each frame of the target, i.e., ${\left\{{\mathbf{Y}}^{\left(c,m,n\right)}\right\}}_{n=1}^{N}$, is termed the singletask TSBHMM (STL TSBHMM). Here, we wish to learn a TSBHMM for all the aspectframes (tasks) of one target jointly, which is referred to as multitask TSBHMM (MTL TSBHMM). MTL is an approach to inductive transfer that improves generalization by using the domain information contained in the training samples of related tasks as an inductive bias [20]. In our learning problems, the aspectframes of one target may be viewed as a set of related learning tasks. Rather than building models for each aspectframe individually (due to targetaspect sensitivity), it is desirable to appropriately share the information among these related data. Therefore, the training data for each task are strengthened and overall recognition performance is potentially improved.
The construction of the MTL TSBHMM with parameters for target c is represented as
where y^{(c, m, n)}(l) is l th time chunk of n th sample's spectrogram in the m th aspectframe of the c th target, ${s}_{l}^{\left(c,m,n\right)}$ denotes the corresponding state indicator, (a_{ α }, b_{ α }) are the preset hyperparameters. Here, the observation model f(·)is defined as independently normal distribution, and each corresponding element in H(·) is normalGamma distribution to preserve conjugacy requirements. Since each time bin of spectrogram feature of a plane corresponds to a fragment of the plane, the HMM states can characterize the frequency domain properties of different fragments of the plane target, i.e., the scattering properties of different physical constructions. A graphical representation of this model is shown in Figure 4a, and Figure 4b depicts that the sequential dependence across time chunks for a given aspectframe is characterized by an HMM structure.
The main difference between MTL TSBHMM and STL TSBHMM is in the proposed MTL TSBHMM, all the multiaspect frames of one target are learned jointly, each of the M_{ c } tasks of target c is assumed to have an independent statetransition statistics, but the statedependent observation statistics are shared across these tasks, i.e., the observation parameters are learned via all aspectframes; while in the STL TSBHMM, each multiaspect frame of target c is learned separately, therefore, each targetaspect frame builds its own model and the corresponding parameters are learned just via this aspectframe.
4.2 Model learning
The parameters of proposed MTL TSBHMM model are treated as variables, and this model can readily be implemented by Markov Chain Monte Carlo (MCMC) [28] method. However, to approximate the posterior distribution over parameters, MCMC requires large computational resources to assess the convergence and reliability of estimates. In this article, we employ VB inference [19, 29, 30], which does not generate a single point estimation of the parameters, but regard all model parameters as possible, with the goal of estimating the posterior density function on the model parameters, as a compromise between accuracy and computational cost for largescale problems.
The goal of Bayesian inference is to estimate the posterior distribution of model parameters Φ. Given the observation data X and hyper parameters γ, by Bayes' rule, the posterior density for the model parameters may be expressed as
where the denominator ∫p(XΦ, γ)p(Φγ)d Φ = p(Xγ) is the model evidence (marginal likelihood).
VB inference provides a computationally tractable way which seeks a variational distribution q(Φ) to approximate the true posterior distribution of the latent variables p(ΦX, γ), we obtain the expression
where $L\left(q\left(\mathbf{\Phi}\right)\right)=\int q\left(\mathbf{\Phi}\right)log\frac{p\left(\mathbf{X}\mathbf{\Phi},\mathbf{\gamma}\right)p\left(\mathbf{\Phi}\mathbf{\gamma}\right)}{q\left(\mathbf{\Phi}\right)}d\mathbf{\Phi}$, and KL(q(Φ) p(ΦX, γ)) is the KullbackLeibler (KL) divergence between the variational distributions q(Φ) and the true posterior p(ΦX, γ). Since KL(q(Φ) p(ΦX, γ)) ≥ 0, and it reaches zero when q(Φ) = p(ΦX, γ), this forms a lower bound for log p(Xγ), so we have log p(Xγ) ≥ L(q(Φ)). The goal of minimizing the KL divergence between the variational distribution and the true posterior is equal to maximize this lower bound, which is known as the negative free energy in statistical physics.
For the computational convenience and intractable of the negativefree energy, we assume a factorized q(Φ), i.e., $q\left(\mathbf{\Phi}\right)=\prod _{k}{q}_{k}\left({\phi}_{k}\right)$, which has same form as employed in p(ΦX, γ). With this assumption, the mean field approximation of the variational distributions for the proposed MTL TSBHMM with target c may be expressed as
where $\left\{{\left\{{\mathbf{w}}_{i}^{\left(c,m\right)}\right\}}_{m=1,i=0}^{{M}_{c},I},{\left\{{s}_{l}^{\left(c,m,n\right)}\right\}}_{m=1,n=1,l=1}^{{M}_{c},N,{L}_{y}},\mathbf{\theta},\mathbf{\beta}\right\}$ are the latent variables in this MTL model.
A general method for performing variational inference for conjugateexponential Bayesian networks outlined in [17] is as follows: for a given node in a graphic model, write out the posterior as though everything were known, take the logarithm, the expectation with respect to all known parameters and exponentiate the result. We can implement expectationmaximization (EM) algorithm in variational inference. The lower bound is increased in each of iteration until the algorithm converges. In the following experiments, we terminate EM algorithm when the changes of the lower bound can be neglected (the threshold is 10^{6}). Since it requires computational resource comparable to EM algorithm, variational inference is faster than MCMC methods. The detailed update equations for the latent variables and hyperparameters of MTL TSBHMM with HRRP spectrogram feature are summarized in the Appendix.
4.3 Main procedure of radar HRRP target recognition based on the proposed MTL TSBHMM algorithm
The main procedure of radar HRRP target recognition based on the proposed MTL TSBHMM algorithm is shown as follows.
4.3.1. Training phase

(1)
Divide the training samples of target c (c = 1, 2, ..., C) into HRRP frames ${\left\{{\mathbf{x}}^{\left(c,m\right)}\right\}}_{m=1}^{{M}_{c}}$, where M_{ c } is the number of tasks of target c, ${\mathbf{x}}^{\left(c,m\right)}={\left\{{\mathbf{x}}^{\left(c,m,n\right)}\right\}}_{n=1}^{N}$ denotes the m th range aligned and amplitude normalized HRRP frame, N is the number of echoes a frame contains.

(2)
Extract the spectrogram feature ${\left\{{\mathbf{Y}}^{\left(c,m,n\right)}\right\}}_{m=1,n=1}^{{M}_{c},N}$ of each HRRP sample with Y^{(c, m, n)}= [y^{(c, m, n)}(1), y^{(c, m, n)}(2), ..., y^{(c, m, n)}(L_{ y })] denoting the spectrogram feature of x^{(c, m, n)}as defined in (2).

(3)
For each target, we construct an MTL TSBHMM model, and learn the parameters of ${w}_{0,{s}_{1}^{\left(c,m\right)}}^{\left(c,m\right)}$, ${w}_{i,j}^{\left(c,m\right)}$, and θ_{ i }for all aspectframes of the target via using spectrogram feature, where ${w}_{0,{s}_{1}^{\left(c,m\right)}}^{\left(c,m\right)}$ is the initial state probability for the index frame m of target c, ${w}_{i,j}^{\left(c,m\right)}$ is state transition probability from state i to the j for the index frame m of target c, and θ_{ i }are the parameters of observation model associated with corresponding state i (c ∈ {1, ..., C}, m ∈ {1, ..., M_{ c }}, i, j ∈ {1, ..., I}). The detailed learning procedure of the parameters of MTL TSBHMM with HRRP spectrogram feature are discussed in Section 4.3 and the Appendix.

(4)
Store the parameters of initial state probability ${\left\{{w}_{0,{s}_{1}^{\left(c,m\right)}}^{\left(c,m\right)}\right\}}_{m=1}^{{M}_{c}}$, state transition probability ${\left\{{}_{\mathbf{w}}^{\left(c,m\right)}\right\}}_{m=1}^{{M}_{c}}$ and the parameters of observation model ${\left\{{\mathbf{\theta}}_{i}\right\}}_{i=1}^{I}$ for each target c with c = 1, 2, ..., C.
4.3.2. Classification phase

(1)
The amplitude normalized HRRP testing sample is timeshift compensated with respect to the averaged HRRP of each frame model via the slide correlation processing [23].

(2)
Extract the spectrogram feature ${\left\{{\mathbf{Y}}_{\mathsf{\text{test}}}^{\left(c,m\right)}\right\}}_{c=1,m=1}^{C,{M}_{c}}$of the slidecorrelated HRRP testing sample x_{test}, where ${\mathbf{Y}}_{\mathsf{\text{test}}}^{\left(c,m\right)}=\left\{\left[{\mathbf{y}}_{\mathsf{\text{test}}}^{\left(c,m\right)}\left(1\right),{\mathbf{y}}_{\mathsf{\text{test}}}^{\left(c,m\right)}\left(2\right),\dots ,{\mathbf{y}}_{\mathsf{\text{test}}}^{\left(c,m\right)}\left({L}_{y}\right)\right]\right\}$ denotes the spectrogram feature of HRRP testing sample correlated with m th frame of target c as defined in (2).

(3)
The frameconditional likelihood of target can be calculated as
$$p\left({\mathbf{Y}}_{\mathsf{\text{test}}}^{\left(c,m\right)}c,m\right)=\sum _{\mathbf{s}}\u3008{w}_{0,{s}_{1}^{\left(c,m\right)}}^{\left(c,m\right)}\u3009\prod _{l=2}^{{L}_{y}}\u3008{w}_{{s}_{l1}^{\left(c,m\right)},{s}_{l}^{\left(c,m\right)}}^{\left(c,m\right)}\u3009\prod _{l=1}^{{L}_{y}}{f}^{\left(c,m\right)}\left({\mathbf{y}}_{\mathsf{\text{test}}}^{\left(c,m\right)}\left(l\right)\u3008{\mathbf{\theta}}_{{s}_{l}^{\left(c,m\right)}}\u3009\right)\phantom{\rule{1em}{0ex}}$$(11)
where 〈·〉 means the posterior expectation for the latent variable over the corresponding distribution on it, e.g., $\u3008{w}_{0,{s}_{1}^{\left(c,m\right)}}^{\left(c,m\right)}\u3009$ denotes the posterior expectations of initial state probability, $\u3008{w}_{{s}_{l1}^{\left(c,m\right)},{s}_{l}^{\left(c,m\right)}}^{\left(c,m\right)}\u3009$ denotes the posterior expectations of state transition probability from state ${s}_{l1}^{\left(c,m\right)}$ to the ${s}_{l}^{\left(c,m\right)}$ for the frame m of target c, and $\u3008{\mathbf{\theta}}_{{s}_{l}^{\left(c,m\right)}}\u3009$ denotes the posterior expectations of the observation model parameters associated with state ${s}_{l}^{\left(c,m\right)}$, with the corresponding state indicator for the l th time chunk ${s}_{l}^{\left(c,m\right)}\in \left\{1,\cdots \phantom{\rule{0.3em}{0ex}},I\right\}$. Then, $p\left({\mathbf{Y}}_{\mathsf{\text{test}}}^{\left(c,m\right)}c,m\right)$ can be calculated by forwardbackward procedure [24] for each m (m ∈ {1, ..., M_{ c }}) and c (c ∈ {1, ..., C}).

(4)
We calculate the classconditional likelihood $p\left({\mathbf{Y}}_{\mathsf{\text{test}}}^{\left(c\right)}c\right)$ for each target c
$$p\left({\mathbf{Y}}_{\text{test}}^{\left(c\right)}c\right)=\underset{m}{max}p\left({\mathbf{Y}}_{\text{test}}^{\left(c,m\right)}c,m\right);\phantom{\rule{1em}{0ex}}m=1,\cdots ,{M}_{c}$$(12) 
(5)
As discussed in Section 1, the testing HRRP sample will be assigned to the class with the maximum classconditional likelihood, with the assumption that the prior class probabilities are same for all targets of interests,
$$k=arg\phantom{\rule{0.3em}{0ex}}\underset{c}{max}p\left({\mathbf{Y}}_{\mathsf{\text{test}}}^{\left(c\right)}c\right);\phantom{\rule{1em}{0ex}}c=1,\cdots \phantom{\rule{0.3em}{0ex}},C$$(13)
5. Experimental results
5.1 Measured data
We examine the performance of the TSBHMM on the 3class measured data, including three actual aircrafts (a propeller plane An26, a small yet plane Cessna Citation S/II, and a big yet plane Yark42), the radar works on a C band with bandwidth of 400 MHz, the range resolution of the HRRP is about 0.375 m.
The parameters of the targets and radar are shown in Table 1, and the projections of target trajectories onto the ground plane are displayed in Figure 5, from which the aspect angle of the airplane can be estimated according to its relative position to radar. As shown in Figure 5, all aspects of targets were measured repeatedly several times in this dataset. The requirements of choosing training data and test data are that the training data and the test data are from different data segments, and the training data cover almost all of targetaspect angles of the test data, but their elevation angles are different. The second and the fifth segments of Yark42, the sixth and the seventh segments of Cessna Citation S/II and the fifth and the sixth segments of An26 are taken as the training samples while the remaining data are left for testing. These training data almost cover all of the targetaspect angles. Also, we need test data from different target to measure the rejection performance of our model. Here, we use 18,000 truck HRRP samples generated by the electromagnetic simulator software, XPATCH, as a confuser target. In addition, the HRRP samples are 128dimensional vectors.
As discussed in the literature [11, 12], it is a prerequisite for radar target recognition to deal with the targetaspect, timeshift, and amplitudescale sensitivity. According to radar parameters and the condition of aspect sectors without MTRC, for training data from 3 targets we totally have 135 HRRP frames, of which 35 from Yark42, 50 from Cessna Citation S/II and 50 from An26. Similar to the previous study [11–13], HRRP training samples should be aligned by the timeshift compensation techniques in ISAR imaging [23] to avoid the influence of timeshift sensitivity. Each HRRP sample is normalized by L_{2} normalization algorithm to avoid the amplitudescale sensitivity. In the rest of the article, the training HRRPs in each frame are assumed to have been aligned and normalized.
Nine training datasets are considered for training: 1 × 135, 2 × 135, 4 × 135, 8 × 135, 16 × 135, 32 × 135, 64 × 135, 128 × 135 and 1024 × 135, where 2 × 135 means 2 HRRP samples randomly selected from each of the 135 targetaspect frames, i.e., there are totally 2 × 135 = 270 HRRP training samples, similar meaning for other size of HRRP datasets. Since MTL needs load the whole HRRP training dataset of a target to share the information among them, it requires more memory resource than STL. Due to the limited memory resource of our computer, we do not consider 1024 × 135 training dataset in MTL. Since there is no prior knowledge about how many states we should use, and how to set these states, the HMM states are not manually set like [31]. We set a large truncation number in our model to learn the meaningful states automatically. In the following experiments, we set the truncation level I to 40 for both spectrogram feature and time domain feature in STL, and I to 60 for both spectrogram feature and time domain feature in MTL. Similar results were found for lager truncations. In our model, since the parameter β_{ i } controls the prior distribution on the number of states, we set the hyperparameters a_{ α } = b_{ α } = 10^{6} for each β_{ i } to promote sparseness on states.
5.2 Time domain feature versus spectrogram feature
In this experiment, STL TSBHMM and MTL TSBHMM of HRRP training datasets within each frame are learned, respectively, and the two features, i.e., time domain and spectrogram features, are compared. When using the HRRP time domain feature, we can just substitute the scalar x^{(c, m, n)}(l) for the vector y^{(c, m, n)}(l) in (6), where x^{(c, m, n)}(l) represents the n th HRRP sample in the m th frame of target c, with l denoting the corresponding range cell in the HRRP sample.
Figure 6 shows that the performances of STL TSBHMMs based on time domain feature are better than those based on spectrogram feature when training dataset no more than 32 × 135. The reason is that more parameters need to be estimated for the model with spectrogram feature than that with time domain feature. For example, for a targetaspect frame with 32 training samples, we need to estimate the 40 16dimensional states for the model with spectrogram feature; while 25 1dimensional states for the model with time domain feature. When training data size larger than 32 × 135, spectrogram feature obtains obviously better performance. Table 2 further compares the confusion matrices and average recognition rates of using time domain feature and spectrogram feature with 1024 × 135 training samples via STL. We can clearly find that the average recognition rate obtained by spectrogram feature is about 6.6% points larger than that obtained by the time domain feature. The performances of MTL TSBHMMs are shown in Figure 7. Since MTL sharing states between different tasks of a target, which is better for parameter learning with small training data size, spectrogram domain feature outperforms time domain feature even with few training data.
The posterior state distributions of MTL TSBHMM with spectrogram feature for all the three plane targets with 128 × 135 training data are shown in Figure 8. In this example, the state truncation level of I = 60 is employed for each plane. Further for each plane, the 60 hidden states are shared across all aspectframes, and there are 46, 48, and 49 meaningful states with the posterior state usage larger than zero for Yark42, Cessna Citation S/II, and An26, respectively, and those of other 14, 12, and 11 states are zero, which justifies using the truncated version stickbreaking prior for our data.
Next, we consider the target rejection problem. Three planes targets are considered as "inclass targets", while the simulated data consists of 18,000 truck HRRP samples are considered as confuser targets. Two examples of confuser targets HRRP samples are shown in Figure 9. Our goal is to test whether a new data is in the family of the inclass targets or not. Figure 10 presents the rejection performance evaluated by the receiver operation characteristic (ROC) curves. The ROC curve depicts the detection probability versus the false alarm probability. For a fixed false alarm probability, a method with the higher detection probability is better. The dataset size is selected as 128 × 135 in the training phase. The ROC curves are shown in Figure 10. The spectrogram featurebased TSBHMMs outperforms the time domain featurebased TSBHMMs, especially for Yark42. Figure 11 shows the test likelihoods of 1,200 Yark42 samples and 18,000 confuser target samples obtained with STL TSBHMM. As shown in Figure 11a, when using the time domain feature, the test likelihoods of Yark42 samples are relative low, and many confuser samples have higher test likelihoods than Yark42 samples. That is to say, when we set a high discrimination threshold, the detection probability is very low, and the false alarm probability is high. By contrast, from Figure 11b, we can find that when using the spectrogram feature, the test likelihoods of Yark42 samples are higher than most of the test likelihoods of the confuser samples. Therefore, the detection performance of the spectrogram feature is much better than that of the time domain feature for Yark42.
5.3 STL versus MTL
In order to model spectrogram feature, two parameters need to be set first, i.e., the width of window function and the length of the overlapped window.
In the feature space, the spectrogram varies with the width of the window function and the overlap across the windows. For HRRP data analysis, a wide window function provides better frequency resolution, but worsens the time resolution, and vice versa. Physically, since the width of a window function determines the length of segments of a target, the longer the segment we divide a target into, the more physical composition of the target will be contained in the each component of the observation vector; meanwhile, the overlap across the windows determines the redundancy of the segments.
We build a set of MTL TSBHMMs to search these two parameters. The width of the window function is chosen from 10 to 40 range cells with an increment of 1 range cells, and the overlap length is fixed as typically half width of the window function. In this experiment, we use 64 × 135 training samples. As demonstrated in Figure 12, the optimal width of the window function is 33 range cells. We then fix the optimal window function width, and set the overlap length from 1 range cell to 29 range cells to determine the optimal overlap length. From Figure 13, the optimal overlap length is 16 range cells. Therefore, we extract the spectrogram feature with the window function width of 33 range cells and the overlap length of 16 range cells for training.
We compare two methods for spectrogram feature based TSBHMM: (i) the proposed MTL TSBHMMs method, for which we learn targetaspect frames of a target collectively; (ii) the STL TSBHMMs method, for which each targetaspect frame of targets modeled separately. As shown in Figure 14, the proposed MTL TSBHMMs method consistently outperforms the STL TSBHMMs method and the improvement is more significant when there is only a small amount of training data available. This is because MTL exploits the sharing states between different tasks and uses the sharing information to enhance the overall performance. In addition, in the state truncation level of I = 60 is employed for each of the three planes. For each plane, the 60 states are shared across the aspectframes. Therefore, we only impose 60 × 3 = 180 states in the MTL TSBHMMs. However, in the training phase of STL model, the state truncation level of I = 40 is employed for each aspectframe. As discussed in Section 5.1, we have 50 + 50 + 35 = 135 aspectframes from the three targets. Therefore, we totally impose 40 × 135 = 5400 states in the STL TSBHMMs. Table 3 summarizes the confusion matrix and average recognition performance of STL TSBHMMs and MTL TSBHMMs, with 2 × 135 and 128 × 135 training samples. Note that when we only use 2 × 135 samples for training, the average recognition rate of STL TSBHMMs is only 52.7%, while the average recognition rate of MTL TSBHMMs is 88.0%. With the training data increasing, the performance of STL becomes close to that of MTL. The average recognition rate obtained by MTL is about 2.3% higher than that by STL for 128 × 135 training samples.
We also consider the target rejection problem here. The ROC curves of MTL TSBHMMs are presented in Figure 15. Compare with Figure 10, the area under curve (AUC) of STL are slightly larger than that of MTL.
5.4 MTL with power transformed spectrogram feature
We employ the power transform metric for all the observation vectors (the vectors in each fixed time) of the spectrogram. Figure 16 show that the optimal parameter a* = 0.4. Figure 17 shows the performance of MTL TSBHMMs is better than STLHMMs for power transformed spectrogram feature. Compared with the average recognition rate of original spectrogram feature in Figure 14, that of power transformed spectrogram feature shown in Figure 17 are much larger, especially for small training data sets.
The confusion matrix and average recognition rates of STL TSBHMMs and MTL TSBHMMs with power transformed spectrogram feature are shown in Table 4, where 2 × 135 and 128 × 135 training samples are used for learning the models. Note that, when we consider 2 × 135 training samples for MTL TSBHMMs based on spectrogram feature with the optimal power transformation, the average of recognition rate is nearly equivalent to the case of considering 128 × 135 training samples for that based on original spectrogram feature. When using 128 training samples per targetaspect frame for STL TSBHMMs, the average recognition rate is gained by 4.3% via power transformation; while for MTL TSBHMMs the gain is 2.5%.
Similarly, as shown in Figure 18, we obtained the ROC curve for power transformed spectrogram feature in the same experimental environment as that we mentioned in Section 5.3. The AUC of STL and MTL with transformed spectrogram features are gained by 3.3 and 5.0%; therefore, the model of using the transformed spectrogram features can improve the rejection performance.
5.5 Computation burden
All experiments have been performed in nonoptimized programme written in Matlab, on a Pentium PC with 3.2GHz CPU and 2 GB RAM. In our VB algorithm, when the relative change of lower bound between two consecutive iterations is less than the threshold 10^{6}, we believe our algorithm converges. Generally, in a practical application, the larger training dataset requires the huger computational burden in the training phase. When the training dataset contains 128 × 135 training samples, the VB algorithm of the MTL TSBHMM with time domain feature and the truncation number I = 60 converges after about 400 iterations and requires about 14 h, and the VB algorithm of MTL TSBHMMs with spectrogram feature and the truncation number I = 60 converges after about 200 iterations and requires about 2 h. Although the above computation is pretty expensive, we know that the computation cost in the training phase can be ignored for an offline learning (or training) system, and it is more important to evaluate the computation cost in the classification phase. The MTL TSBHMMs with time domain feature and spectrogram feature require 0.6680 and 1.5893 s, respectively, to match a test sample with all frame models. The computation time given here is averaged over ten runs.
6. Conclusion
We have utilized spectrogram feature of HRRP data and presented an MTLbased hidden Markov model with truncated stickbreaking prior (MTL TSBHMM) for radar HRRP target recognition. The construction of this model allows VB inference, which extremely decreases the computational burden.
After resolving the three sensitivity problems, i.e., the targetaspect, timeshift, and amplitude scale sensitivity of HRRP, respectively, we first compare the spectrogram feature of HRRP with the time domain feature of HRRP data via singletask and multitask learningbased hidden Markov model with truncated stickbreaking prior (STL and MTL TSBHMM). Second, we measure the performance of STL TSBHMM and MTL TSBHMM with spectrogram feature, where in MTL TSBHMM, the multiple tasks are linked by different targetaspects. Finally, we introduce power transformation metric to improve the recognition performance of spectrogram feature. It is shown that using spectrogram feature not only have a better ROC, but also obtain a better recognition performance than using HRRP time domain feature. MTL shares the underlying state information among different targetaspects, and can provide a better recognition performance compared to STL. In addition, the power transformation metric can enhance both average recognition rate and ROC. It is worth to point out that our MTL model with spectrogram feature can obtain a good recognition performance with much fewer training data compared with the conventional radar HRRPbased statistical recognition methods, which is a good property for RATR.
Appendix
Derivation of update equations in VB approach
1. Update q(θ_{ i }):
For the MTL TSBHMM model introduced in Section 4.3 and the corresponding meanfield variational distribution described in (9), q(θ_{ i }) is defined by the specific application:
where q(θ_{ i }) is defined by the specific application, and ${\stackrel{\u0303}{\mathbf{\theta}}}_{i}$ denotes the parameters in the q(θ_{ i }), $<{s}_{l,i}^{\left(c,m,n\right)}>$ represents the expected number of state indicator ${s}_{l}^{\left(c,m,n\right)}$ with outcome i. If model is conjugate exponential, that is, H(·) is conjugate to the likelihood f(·), we can readily obtain the update equation for q(θ_{ i }).
2. Update q(β_{ i })
In MTL model, we assume $q\left({\beta}_{i}\right)=\mathbf{G}\mathbf{a}\left({\stackrel{\u0303}{a}}_{\alpha}^{\left(i\right)},{\stackrel{\u0303}{b}}_{\alpha}^{\left(i\right)}\right)$, the updating equation for q(β_{ i }) with m = 1, ..., M_{ c } are expressed as follows:
where ψ(·) is the digamma function.
3. Update $q\left({\mathbf{w}}_{0}^{\left(c,m\right)}\right)$ and $q\left({\mathbf{w}}_{i}^{\left(c,m\right)}\right)$
For the given prior $p\left({\mathbf{w}}_{0}^{\left(c,m\right)}\right)=\mathbf{G}\mathbf{D}\mathbf{D}\left({\mathbf{1}}_{\left(I1\right)\times 1},{\left[{\beta}_{0}\right]}_{\left(I1\right)\times 1}\right)$ and $p\left({\mathbf{w}}_{i}^{\left(c,m\right)}\right)=\mathbf{G}\mathbf{D}\mathbf{D}\left({\mathbf{1}}_{\left(I1\right)\times 1},{\left[{\beta}_{i}\right]}_{\left(I1\right)\times 1}\right)$, with m = 1, ..., M_{ c } and i = 1, ..., I, assume $q\left({\mathbf{w}}_{0}^{\left(c,m\right)}\right)=\mathbf{G}\mathbf{D}\mathbf{D}\left({\stackrel{\u0303}{\mathbf{\beta}}}_{0}^{\left(c,m\right)}{}_{1},{\stackrel{\u0303}{\mathbf{\beta}}}_{0}^{\left(c,m\right)}{}_{2}\right)$ and $q\left({\mathbf{w}}_{i}^{\left(c,m\right)}\right)=\mathbf{G}\mathbf{D}\mathbf{D}\left({\stackrel{\u0303}{\mathbf{\beta}}}_{i}^{\left(c,m\right)}{}_{1},{\stackrel{\u0303}{\mathbf{\beta}}}_{i}^{\left(c,m\right)}{}_{2}\right)$. The updating equation for $q\left({\mathbf{w}}_{0}^{\left(c,m\right)}\right)$ and $q\left({\mathbf{w}}_{i}^{\left(c,m\right)}\right)$ are given as follows:
4. Update log q(s)
Given the approximate distribution of the other variables, the update equation for q(s^{(c, m, n)}) are given as follows:
where < · > denotes the expectation of the associated variables function. One may derive that
References
 1.
Zwart J, Heiden R, Gelsema S, Groen F: Fast translation invariant classification of HRR range profiles in a zero phase representation. IEE Proc Radar Sonar Navigat 2003, 150(6):411418. 10.1049/iprsn:20030428
 2.
Vander Heiden R, Groen FCA: The BoxCox metric for nearest neighbour classification improvement. Pattern Recognit 1997, 30(2):273279. 10.1016/S00313203(96)000775
 3.
Xing MD, Bao Z, Pei B: The properties of highresolution range profiles. Opt Eng 2002, 41(2):493504. 10.1117/1.1431251
 4.
Carrara WG, Goodman RS, Majewski RM: Spotlight Synthetic Aperture RaderSignal Processing Algorithms. Arthech House, Norwood, MA; 1995.
 5.
Chai J, Liu HW, Bao Z: Combinatorial discriminant analysis: supervised feature extraction that integrates global and local criteria. Electron Lett 2009, 45(18):934935. 10.1049/el.2009.1423
 6.
Du L, Liu HW, Bao Z: Radar HRRP target recognition based on higherorder spectra. IEEE Trans Signal Process 2005, 53(7):23592368.
 7.
Chen B, Liu HW, Yuan L, Bao Z: Adaptively segmenting angular sectors for radar HRRP ATR. EURASIP J Adv Signal Process 2008, 2008: 6. Article ID 641709
 8.
Mitchell RA, Westerkamp JJ: Robust statistical feature based aircraft identification. IEEE Trans Aerosp Electron Syst 1999, 35(3):10771094. 10.1109/7.784076
 9.
Copsey K, Webb AR: Bayesian Gamma mixture model approach to radar target recognition. IEEE Trans Aerosp Electron Syst 2003, 39(4):12011217. 10.1109/TAES.2003.1261122
 10.
Webb AR: Gamma mixture models for target recognition. Pattern Recognit 2000, 33: 20452054. 10.1016/S00313203(99)001958
 11.
Du L, Liu HW, Bao Z, Zhang JY: A twodistribution compounded statistical model for radar HRRP target recognition. IEEE Trans Signal Process 2006, 54(6):22262238.
 12.
Du L, Liu HW, Bao Z: Radar HRRP target recognition based on hypersphere model. Signal Process 2008, 88(5):11761190. 10.1016/j.sigpro.2007.11.003
 13.
Du L, Liu HW, Bao Z: Radar HRRP statistical recognition: parametric model and model selection. IEEE Trans Signal Process 2008, 56(5):19311944.
 14.
Pei B, Bao Z: Multiaspect radar target recognition method based on scattering centers and HMMs classifiers. IEEE Trans Aerosp Electron Syst 2005, 41(3):10671074. 10.1109/TAES.2005.1541451
 15.
Liao XJ, Runkle P, Carin L: Identification of ground targets from sequential highrangeresolution radar signatures. IEEE Trans Aerosp Electron Syst 2002, 38(4):12301242. 10.1109/TAES.2002.1145746
 16.
Zhu F, Zhang XD, Hu YF, Xie D: Nonstationary hidden Markov models for multiaspect discriminative feature extraction from radar targets. IEEE Trans Signal Process 2007, 55(5):22032213.
 17.
Winn J, Bishop CM: Variational message passing. J Mach Learn Res 2005, 6: 661694.
 18.
Ni K, Qi Y, Carin L: Multiaspect target detection via the infinite hidden Markov model. J Acoust Soc Am 2007, 121(5):27312742. 10.1121/1.2714912
 19.
Paisley J, Carin L: Hidden markov models with stick breaking priors. IEEE Trans Signal Process 2009, 57: 39053917.
 20.
Caruana R: Multitask learning. Mach Learn 1997, 28: 4175. 10.1023/A:1007379606734
 21.
Gürbüz SZ, Melvin WL, Williams DB: Detection and identification of human targets in radar data. Proc SPIE 2007, 6567: 65670I.
 22.
Kingsbury BED, Morgan N, Greenberg S: Robust speech recognition using the modulation spectrogram. Speech Commun 1998, 25(13):117132. 10.1016/S01676393(98)000326
 23.
Walker JL: RangeDoppler imaging of rotating objects. IEEE Trans Aerosp Electron Syst 1980, 16(1):2352.
 24.
Rabiner LR: A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 1989, 77: 275285.
 25.
Beal MJ, Ghahramani Z, Rassmussen CE: The infinite hidden Markov model. In Advances in Neural Information Processing Systems 14. Cambridge, MA, MIT Press; 2002:577585.
 26.
Wong TT: Generalized Dirichlet distribution in Bayesian analysis. Appl Math Comput 1998, 97: 165181. 10.1016/S00963003(97)101400
 27.
Teh Y, Jordan M, Beal M, Blei D: Hierarchical Dirichlet processes. J Am Stat Assoc 2005, 101: 15661582.
 28.
Carrara WG, Goodman RS, Majewski RM: Markov Chain Monte Carlo in Practice. Chapman Hall, London; 1996.
 29.
Beal MJ: Variational algorithms for approximate Bayesian inference. In Ph.D. dissertation. Gatsby Computational Neuroscience Unit, University College London; 2003.
 30.
Jordan MI, Ghahramani Z, Jaakkola TS, Saul L: An introduction to variational methods for graphical models. Mach Learn 1999, 37(2):183233. 10.1023/A:1007665907178
 31.
Ni K, Qi Y, Carin L: Multiaspect target classification and detection via the infinite hidden Markov model. Acoust Speech Signal Process 2007, 2: II433II436.
Acknowledgements
This study was partially supported by the National Science Foundation of China (No. 60901067), the Program for New Century Excellent Talents in University (NCET090630), the Program for Changjiang Scholars and Innovative Research Team in University (IRT0954), and the Foundation for Author of National Excellent Doctoral Dissertation of PR China (FANEDD201156).
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Pan, M., Du, L., Wang, P. et al. Multitask hidden Markov modeling of spectrogram feature from radar highresolution range profiles. EURASIP J. Adv. Signal Process. 2012, 86 (2012). https://doi.org/10.1186/16876180201286
Received:
Accepted:
Published:
Keywords
 radar automatic target recognition (RATR)
 highresolution range profile (HRRP)
 spectrogram feature
 hidden Markov model (HMM)
 multitask learning (MTL)
 variational Bayes (VB)