A unified approach to sparse signal processing
- Farokh Marvasti^{1}Email author,
- Arash Amini^{1},
- Farzan Haddadi^{2},
- Mahdi Soltanolkotabi^{1},
- Babak Hossein Khalaj^{1},
- Akram Aldroubi^{3},
- Saeid Sanei^{4} and
- Janathon Chambers^{5}
https://doi.org/10.1186/1687-6180-2012-44
© Marvasti et al.; licensee Springer. 2012
Received: 30 July 2011
Accepted: 23 November 2011
Published: 22 February 2012
Abstract
A unified view of the area of sparse signal processing is presented in tutorial form by bringing together various fields in which the property of sparsity has been successfully exploited. For each of these fields, various algorithms and techniques, which have been developed to leverage sparsity, are described succinctly. The common potential benefits of significant reduction in sampling rate and processing manipulations through sparse signal processing are revealed. The key application domains of sparse signal processing are sampling, coding, spectral estimation, array processing, component analysis, and multipath channel estimation. In terms of the sampling process and reconstruction algorithms, linkages are made with random sampling, compressed sensing, and rate of innovation. The redundancy introduced by channel coding in finite and real Galois fields is then related to over-sampling with similar reconstruction algorithms. The error locator polynomial (ELP) and iterative methods are shown to work quite effectively for both sampling and coding applications. The methods of Prony, Pisarenko, and MUltiple SIgnal Classification (MUSIC) are next shown to be targeted at analyzing signals with sparse frequency domain representations. Specifically, the relations of the approach of Prony to an annihilating filter in rate of innovation and ELP in coding are emphasized; the Pisarenko and MUSIC methods are further improvements of the Prony method under noisy environments. The iterative methods developed for sampling and coding applications are shown to be powerful tools in spectral estimation. Such narrowband spectral estimation is then related to multi-source location and direction of arrival estimation in array processing. Sparsity in unobservable source signals is also shown to facilitate source separation in sparse component analysis; the algorithms developed in this area such as linear programming and matching pursuit are also widely used in compressed sensing. Finally, the multipath channel estimation problem is shown to have a sparse formulation; algorithms similar to sampling and coding are used to estimate typical multicarrier communication channels.
1 Introduction
Various topics and applications with sparsity properties: the sparsity, which may be in the time/space or “frequency” domains, consists of unknown samples/coefficients that need to be determined
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|
1 | Category | Topics | Sparsity | Type of | Information | Type of | Min number | Conventional | Applications |
domain | sparsity | domain | sampling in | of required | reconstruction | ||||
info. domain | samples | methods | |||||||
2 | Sampling | Uniform | Frequency | Lowpass | Time/space | Uniform | 2×BW−1 | Lowpass | A/D |
sampling | filtering/ | ||||||||
Interpolation | |||||||||
3 | Nonuniform | Frequency | Lowpass | Time/space | Missing samp- | 2×BW−1 | Iterative metho- | Seismic/ | |
sampling | -les/jitter/per- | (in some cases | -ds/filter banks/ | MRI/CT/ | |||||
-iodic/random | even BW) | spline interp. | FM/ PPM | ||||||
4 | Sampling of | Frequency | Union of | Time/pace | Uniform/jit- | $2\times \sum \mathit{\text{BW}}$ | Iterative metho- | Data | |
multiband | disjoint | -ter/periodic/ | -ds/filter banks/ | compression/ | |||||
signals | intervals | random | interpolation | radar | |||||
5 | Random | Frequency | Random | Time/space | Random/ | $2\times \sum \#$ | Iterative methods: | Missing samp. | |
sampling | uniform | coeff. | adapt. thresh. | recovery/ | |||||
RDE/ELP | data comp. | ||||||||
6 | Compressed | An arbitrary | Random | Random | Random | $c\xb7k\xb7log\left(\frac{n}{k}\right)$ | Basis pursuit/ | Data | |
sensing | orthonormal | mapping of | mixtures | matching | compression | ||||
transform | time/space | of samples | pursuit | ||||||
7 | Finite | Time and | Random | Filtered | Uniform | # Coeff. + 1 + | Annihilating | ECG/ | |
rate of | polynomial | time | 2·(# discont. | filter | OCT/ | ||||
innovation | coeff. | domain | epochs) | (ELP) | UWB | ||||
8 | Channel | Galois | Time | Random | Syndrome | Uniform | 2×# errors | Berlekamp | Digital |
coding | field | or | -Massey/Viterbi/ | communic- | |||||
codes | random | belief prop. | -tion | ||||||
9 | Real | Time | Random | Transform | Uniform | 2×# impulsive | Adaptive | Fault | |
field | domain | or | noise | thresholding | tolerant | ||||
codes | random | RDE/ELP | system | ||||||
10 | Spectral | Spectral | Frequency | Random | Time/ | Uniform | 2×# tones | MUSIC/ | Military/ |
estimation | estimation | autocor- | −1 | pisarenko/ | radars | ||||
-relation | prony/MDL | ||||||||
11 | Array | MSL/ | Space | Random | Space/ | Uniform | 2× | MDL+ | Radars/ |
processing | DOA | autocor- | # sources | MUSIC/ | sonar/ | ||||
estimation | -relation | ESPRIT | ultrasound | ||||||
12 | Sparse arr- | Space | Random/ | Space | Peaks of | 2×# desired | Optimiz- | Radars/sonar/ | |
-ay beam- | missing | sidelobes/ | array | -ation: LP/ | ultrasound/ | ||||
-forming | elements | [non]uniform | elements | SA/GA | MSL | ||||
13 | Sensor | Space | Random | Space | Uniform | 2× BW | Similar | Seismic/ | |
networks | of random | to row 5 | meteorology/ | ||||||
field | environmental | ||||||||
14 | SCA | BSS | Active | Random | Time | Uniform | 2×# active | ℓ_{ l }/ℓ_{2}/ | Biomedical |
source/time | sources | SL0 | |||||||
15 | SDR | Dictionary | Uniform/ | Linear mix- | Random | 2× # sparse | ℓ_{ l }/ℓ_{2}/ | Data compression | |
random | -ture of time | words | SL0 | ||||||
samples | |||||||||
16 | Channel | Multipath | Time | Random | Frequency | Uniform/ | 2×# Spa- | ℓ_{ l }/ | Channel equaliz- |
estimation | channels | or time | nonuniform | -rse channel | MIMAT | -ation/OFDM | |||
components |
List of acronyms
ADSL | Asynchronous Digital Subscriber Line | AIC | Akaike Information Criterion |
ARMA | Auto-Regressive Moving Average | AR | Auto-Regressive |
BW | BandWidth | BSS | Blind Source Separation |
CFAR | Constant False Alarm Rate | CAD | Computer Aided Design |
CS | Compressed Sensing | CG | Conjugate Gradient |
DAB | Digital Audio Broadcasting | CT | Computer Tomography |
DCT | Discrete Cosine Transform | DC | Direct Current: Zero-Frequency Coefficient |
DFT | Discrete Fourier Transform | DHT | Discrete Hartley Transform |
DOA | Direction Of Arrival | DST | Discrete Sine Transform |
DT | Discrete Transform | DVB | Digital Video Broadcasting |
DWT | Discrete Wavelet Transform | EEG | ElectroEncephaloGraphy |
ELP | Error Locator Polynomial | ESPRIT | Estimation of Signal Parameters via |
FDTD | Finite-Difference Time-Domain | Rotational Invariance Techniques | |
FETD | Finite-Element Time-Domain | FOCUSS | FOCal Under-determined System Solver |
FPE | Final Prediction Error | GPSR | Gradient Projection Sparse Reconstruction |
GA | Genetic Algorithm | ICA | Independent Component Analysis |
HNQ | Hannan and Quinn method | IDT | Inverse Discrete Transform |
IDE | Iterative Detection and Estimation | ISTA | Iterative Shrinkage-Threshold Algorithm |
IMAT | Iterative Methods with Adaptive Thresholding | KLT | Karhunen Loeve Transform |
ℓ _{1} | Absolute Summable Discrete Signals | ℓ _{2} | Finite Energy Discrete Signals |
LDPC | Low Density Parity Check | LP | Linear Programming |
MA | Moving Average | MAP | Maximum A Posteriori probability |
MDL | Minimum Description Length | ML | Maximum Likelihood |
MIMAT | Modified IMAT | MSL | Multi-Source Location |
MMSE | Minimum Mean Squared Error | NP | Non-Polynomial time |
MUSIC | MUltiple SIgnal Classification | OFDM | Orthogonal Frequency Division Multiplex |
OCT | Optical Coherence Tomography | OMP | Orthogonal Matching Pursuit |
OFDMA | Orthogonal Frequency Division Multiple Access | PCA | Principle Component Analysis |
OSR | Over Sampling Ratio | PHD | Pisarenko Harmonic Decomposition |
Probability Density Function | PPM | Pulse-Position Modulation | |
POCS | Projection Onto Convex Sets | RIP | Restricted Isometry Property |
RDE | Recursive Detection and Estimation | RV | Residual Variance |
RS | Reed-Solomon | SCA | Sparse Component Analysis |
SA | Simulated Annealing | SDFT | Sorted DFT |
SDCT | Sorted DCT | SER | Symbol Error Rate |
SDR | Sparse Dictionary Representation | SL0 | Smoothed ℓ_{0}-norm |
SI | Shift Invariant | ULA | Uniform Linear Array |
SNR | Signal-to-Noise Ratio | WIMAX | Worldwide Inter-operability for Microwave Access |
UWB | Ultra Wide Band | WLAN | Wireless Local Area Network |
WMAN | Wireless Metropolitan Area Network |
Common notations used throughout the article
n | Length of original vector |
k | Order of sparsity |
m | Length of observed vector |
x | Original vector |
s | Corresponding sparse vector |
y | Observed vector |
ν | Noise vector |
A | Transformation matrix relating s to y |
$\parallel {\mathbf{u}}_{n\times 1}{\parallel}_{{\ell}_{p}}$ | ${\left(\sum _{i=1}^{n}|{u}_{i}{|}^{p}\right)}^{\left(\frac{1}{p}\right)}$ |
The rows 2–4 of Table 1 are related to the sampling (uniform or random) of signals that are bandlimited in the Fourier domain. Band-limitedness is a special case of sparsity where the nonzero coefficients in the frequency domain are consecutive. A better assumption in the frequency domain is to have random sparsity [25–27] as shown in row 5 and column 3. A generalization of the sparsity in the frequency domain is sparsity in any transform domain such as Discrete Cosine and Wavelet Transforms (DCT and DWT); this concept is further generalized in CS (row 6) where sampling is taken by a linear combination of time domain samples [2, 28–30]. Sampling of signals with finite rate of innovation (row 7) is related to piecewise smooth (polynomial based) signals. The positions of discontinuous points are determined by annihilating filters that are equivalent to error locator polynomials in error correction codes and the Prony’s method [10] as discussed in Sections 4 and 5, respectively.
Random errors in a Galois field (row 8) and the additive impulsive noise in real-field error correction codes (row 9) are sparse disturbances that need to be detected and removed. For erasure channels, the impulsive noise can be regarded as the negative of the missing sample value [31]; thus the missing sampling problem, which can also be regarded as a special case of nonuniform sampling, is also a special case of the error correction problem. A subclass of impulsive noise for 2-D signals is salt and pepper noise [32]. The information domain, where the sampling process occurs, is called the syndrome which is usually in a transform domain. Spectral estimation (row 10) is the dual of error correction codes, i.e., the sparsity is in the frequency domain. MSL (row 11) and multi-target detection in radars are similar to spectral estimation since targets act as spatial sparse mono-tones; each target is mapped to a specific spatial frequency regarding its line of sight direction relative to the receiver. The techniques developed for this branch of science is unique; with examples such as MUSIC [7], Prony [8], and Pisarenko [9]. We shall see that the techniques used in real-field error correction codes such as iterative methods (IMAT) can also be used in this area.
The array processing category (rows 11–13) consists of three separate topics. The first one covers MSL in radars, sonars, and DOA. The techniques developed for this field are similar to the spectral estimation methods with emphasis on the minimum description length (MDL) [33]. The second topic in the array processing category is related to the design of sparse arrays where some of the array elements are missing; the remaining nodes form a nonuniform sparse grid. In this case, one of the optimization problems is to find the sparsest array (number, locations, and weights of elements) for a given beampattern. This problem has some resemblance to the missing sampling problem but will not be discussed in this article. The third topic is on sensor networks (row 13). Distributed sampling and recovery of a physical field using an array of sparse sensors is a problem of increasing interest in environmental and seismic monitoring applications of sensor networks [34]. Sensor fields may be bandlimited or non-bandlimited. Since the power consumption is the most restricting issue in sensors, it is vital to use the lowest possible number of sensors (sparse sensor networks) with the minimum processing computation; this topic also will not be discussed in this article.
In SCA, the number of observations is much less than the number of sources (signals). However, if the sources are sparse in the time domain, then the active sources and their amplitudes can be determined; this is equivalent to error correction codes. Sparse dictionary representation (SDR) is another new area where signals are represented by the sparsest number of words (signal bases) in a dictionary of finite number of words; this sparsity may result in a tremendous amount of data compression. When the dictionary is over complete, there are many ways to represent the signal; however, we are interested in the sparsest representation. Normally, for extraction of statistically independent sources, independent component analysis (ICA) is used for a complete set of linear mixtures. In the case of a non-complete (underdetermined) set of linear mixtures, ICA can work if the sources are also sparse; for this special case, ICA analysis is synonymous with SCA.
Finally, channel estimation is shown in row 16. In mobile communication systems, multipath reflections create a channel that can be modeled by a sparse FIR filter. For proper decoding of the incoming data, the channel characteristics should be estimated before they can be equalized. For this purpose, a training sequence is inserted within the main data, which enables the receiver to obtain the output of the channel by exploiting this training sequence. The channel estimation problem becomes a deconvolution problem under noisy environments. The sparsity criterion of the channel greatly improves the channel estimation; this is where the algorithms for extraction of a sparse signal could be employed [21, 22, 35].
When sparsity is random, further signal processing is needed. In this case, there are three items that need to be considered. 1—Evaluating the number of sparse coefficients (or samples), 2—finding the positions of sparse coefficients, and 3—determining the values of these coefficients. In some applications, only the first two items are needed; e.g., in spectral estimation. However, in almost all the other cases mentioned in Table 1, all the three items should be determined. Various types of linear programming (LP) and some iterative algorithms, such as the IMAT with adaptive thresholding (IMAT), determine the number, positions, and values of sparse samples at the same time. On the other hand, the minimum description length (MDL) method, used in DOA/MSL and spectral estimation, determines the number of sparse source locations or frequencies. In the subsequent sections, we shall describe, in more detail, each algorithm for various areas and applications based on Table 1.
Finally, it should be mentioned that the signal model for each topic or application may be deterministic or stochastic. For example, in the sampling category for rows 2–4 and 7, the signal model is typically deterministic although stochastic models could also be envisioned [36]. On the other hand, for random sampling and CS (rows 5–6), the signal model is stochastic although deterministic models may also be envisioned [37]. In channel coding and estimation (rows 8–9 and 16), the signal model is normally deterministic. For Spectral and DOA estimation (rows 10–11), stochastic models are assumed, whereas for array beam-forming (row 12), deterministic models are used. In sensor networks (row 13), both deterministic and stochastic signal models are employed. Finally, in SCA (rows 14–15), statistical independence of sources may be necessary and thus stochastic models are applied.
2 Underdetermined system of linear equations
Since m<n, the vector s_{n×1} cannot be uniquely recovered by observing the measurement vector x_{m×1}; however, among the infinite number of solutions to (1), the sparsest solution may be unique. For instance, if no 2k columns of A_{m×n} are linearly dependent, the null-space of A_{m×n} does not include any 2k-sparse vector (at most 2k non-zero elements) and therefore, the measurement vectors (x_{m×n}) of different k-sparse vectors are different. Thus, if s_{n×1}is sparse enough (k-sparse), the sparsest solution of (1) is unique and coincides with s_{n×1}; i.e., perfect recovery. Unfortunately, there are two obstacles here: (1) the vector x_{m×1} often includes an additive noise term, and (2) finding the sparsest solution of a linear system is an NP problem in general.
Since in the rest of the article, we are frequently dealing with the problem of reconstructing the sparsest solution of (1), we first review some of the important reconstruction methods in this section.
2.1 Greedy methods
Greedy algorithms
1. Let $\widehat{\mathbf{s}}={\mathbf{0}}_{n\times 1},\phantom{\rule{1em}{0ex}}{\mathbf{r}}^{\left(0\right)}=\mathbf{x}$, ${\mathcal{S}}^{\left(0\right)}=\varnothing $ and i=1. |
2. Evaluate c_{ j }=〈r^{(i−1)},a_{ j }〉for $j=1,\dots ,n$ where a_{ j }’s are the columns of the mixing matrix A(atoms) and sort c_{ j }’s as $\left|{c}_{{j}_{1}}\right|\ge \cdots \ge \left|{c}_{{j}_{n}}\right|$. |
3. ● MP: Set ${\mathcal{S}}^{\left(i\right)}={\mathcal{S}}^{(i-1)}\cup {j}_{1}$. |
● OMP: Set ${\mathcal{S}}^{\left(i\right)}={\mathcal{S}}^{(i-1)}\cup {j}_{1}$ and ${\mathbf{A}}_{m\times \left|{\mathcal{S}}^{\left(i\right)}\right|}={\left[{\mathbf{a}}_{j}\right]}_{j\in {\mathcal{S}}^{\left(i\right)}}$. |
● CoSaMP: Set ${\mathcal{S}}^{\left(i\right)}={\mathcal{S}}^{(i-1)}\cup \{{j}_{1},\dots ,{j}_{2k}\}$ and ${\mathbf{A}}_{m\times \left|{\mathcal{S}}^{\left(i\right)}\right|}={\left[{\mathbf{a}}_{j}\right]}_{j\in {\mathcal{S}}^{\left(i\right)}}$. |
4. ● MP: −−− |
● OMP & CoSaMP: Find $\stackrel{~}{\mathbf{s}}$ that ${\mathbf{A}}^{\left(i\right)}\xb7\stackrel{~}{\mathbf{s}}=\mathbf{x}$. |
5. ● MP & OMP: −−− |
● CoSaMP: Sort the values of $\stackrel{~}{\mathbf{s}}$ as $\left|{\stackrel{~}{s}}_{{t}_{1}}\right|\ge \left|{\stackrel{~}{s}}_{{t}_{2}}\right|\ge \dots \phantom{\rule{0.3em}{0ex}}$ and redefine ${j}_{1},\dots ,{j}_{k}$ as the indices of the columns in A that correspond to the columns ${t}_{1},\dots ,{t}_{k}$ in A^{(i)}. Also set ${\mathcal{S}}^{\left(i\right)}=\{{j}_{1},\dots ,{j}_{k}\}$. |
6. ● MP: Set ${\u015d}_{{j}_{1}}={c}_{{j}_{1}}$. |
● OMP & CoSaMP: Set ${\u015d}_{{j}_{l}}={\stackrel{~}{s}}_{l}$ for $l=1,\dots ,k$ and ${\u015d}_{l}=0$ where $l\notin {\mathcal{S}}^{\left(i\right)}$. |
7. Set ${\mathbf{r}}^{\left(i\right)}=\mathbf{x}-\mathbf{A}\xb7\widehat{\mathbf{s}}$. |
8. Stop if $\parallel {\mathbf{r}}^{\left(i\right)}{\parallel}_{{\ell}_{2}}$ is smaller than a desired threshold or when a maximum number of iterations is reached; otherwise, increase i and go to step 2. |
2.2 Basis pursuit
The mathematical representation of counting the number of sparse components is denoted by ℓ_{0}. However, ℓ_{0}is not a proper norm and is not computationally tractable. The closest convex norm to ℓ_{0} is ℓ_{1}. The ℓ_{1}optimization of an overcomplete dictionary is called Basis Pursuit. However the ℓ_{1}-norm is non-differentiable and we cannot use gradient methods for optimal solutions [42]. On the other hand, the ℓ_{1} solution is stable due to its convexity (the global optimum is the same as the local one) [20].
Relation between LP and basis pursuit (the notation for LP is from [[43]])
Basis pursuit | Linear programming |
---|---|
m | 2p |
s | x |
${(1,\dots ,1)}_{1\times m}$ | C |
±A | A |
x | b |
2.3 Gradient projection sparse reconstruction (GPSR)
Basic GPSR algorithm
1. Initialize β∈(0,1), $\mu \in (0,\frac{1}{2})$, α_{0}and z^{(0)}. Also set i=0. |
2. Choose α^{(i)}to be the largest number of the form α_{0}β^{ j }, j≥0, such that |
$\begin{array}{c}F\left({\left({\mathbf{z}}^{\left(i\right)}-{\alpha}^{\left(i\right)}\nabla F\left({\mathbf{z}}^{\left(i\right)}\right)\right)}_{+}\right)\le F\left({\mathbf{z}}^{\left(i\right)}\right)-\\ \mu \nabla F{\left({\mathbf{z}}^{\left(i\right)}\right)}^{T}\left({\mathbf{z}}^{\left(i\right)}-{\left({\mathbf{z}}^{\left(i\right)}-{\alpha}^{\left(i\right)}\nabla F\left({\mathbf{z}}^{\left(i\right)}\right)\right)}_{+}\right)\end{array}$ |
3. Set z^{(i + 1)}=(z^{(i)}−α^{(i)}∇F(z^{(i)}))_{+}. |
4. Check the termination criterion. If neither the maximum number of iterations has passed nor a given stopping condition is fulfilled, increase i and return to the 2nd step. |
2.4 Iterative shrinkage-threshold algorithm (ISTA)
ISTA algorithm
1. Choose the scalar β larger than all the singular values of A and set i=0. Also initialize s^{(0)}, e.g, s^{(0)}=A^{+} x. |
2. Set ${\mathbf{z}}^{\left(i\right)}={\mathbf{s}}^{\left(i\right)}+\frac{1}{\beta}{\mathbf{A}}^{H}(\mathbf{x}-\mathbf{A}{\mathbf{s}}^{\left(i\right)})$. |
3. Apply the shrinkage-threshold operator defined in (11): |
${s}_{j}^{(i+1)}={\mathcal{S}}_{[\beta ,\tau ]}\left({z}_{j}^{\left(i\right)}\right),\phantom{\rule{2.36043pt}{0ex}}\phantom{\rule{2.36043pt}{0ex}}1\le j\le n$ |
4. Check the termination criterion. If neither the maximum number of iterations has passed nor a given stopping condition is fulfilled, increase i and return to the 2nd step. |
2.5 FOCal underdetermined system solver (FOCUSS)
FOCUSS (Basic)
● Step 1: ${\mathbf{W}}_{{p}_{i}}=\mathit{\text{diag}}\left({\mathbf{s}}_{i-1}\right)$ |
● Step 2: ${\mathbf{q}}_{i}={\left(\mathbf{A}{\mathbf{W}}_{{p}_{i}}\right)}^{+}\mathbf{x}$ |
● Step 3: ${\mathbf{s}}_{i}={\mathbf{W}}_{{p}_{i}}\xb7{\mathbf{q}}_{i}$ |
2.6 Iterative detection and estimation (IDE)
IDE steps
● Detection Step: Find indices of inactive sources: |
${I}^{l}=\left\{1\le i\le m:\phantom{\rule{2.36043pt}{0ex}}\phantom{\rule{2.36043pt}{0ex}}\left|\right.{\mathbf{a}}_{i}^{T}\xb7\mathbf{x}-\sum _{j\ne i}^{m}{\u015d}_{j}^{l}{\mathbf{a}}_{i}^{T}\xb7{\mathbf{a}}_{j}\left|\right.<{\epsilon}^{l}\right\}$ |
● Estimation Step: Find the following projection as the new estimate: |
${\mathbf{s}}^{l+1}={\text{argmin}}_{\mathbf{s}}\sum _{i\in {I}^{l}}{s}_{i}^{2}\phantom{\rule{2.36043pt}{0ex}}\text{s.t.}\phantom{\rule{2.36043pt}{0ex}}\mathbf{x}\left(t\right)=\mathbf{A}\xb7\mathbf{s}\left(t\right)$ |
The solution is derived from Karush-Kuhn-Tucker system of equations. At the (l + 1)^{ th }iteration |
$\begin{array}{cc}{\mathbf{s}}_{i}& ={\mathbf{A}}_{i}^{T}\xb7\mathbf{P}\left(\mathbf{x}-{\mathbf{A}}_{a}\xb7{\mathbf{s}}_{a}\right)\hfill \\ {\mathbf{s}}_{a}& ={\left({\mathbf{A}}_{a}^{T}\mathbf{P}{\mathbf{A}}_{a}\right)}^{-1}{\mathbf{A}}_{a}^{T}\mathbf{P}\xb7\mathbf{x}\hfill \end{array}$ |
where the matrices and vectors are partitioned into inactive/active parts as A_{ i },A_{ a },s_{ i },s_{ a }and $\mathbf{P}={\left({\mathbf{A}}_{i}{\mathbf{A}}_{i}^{T}\right)}^{-1}$ |
● Stop after a fixed number of iterations. |
2.7 Smoothed ℓ_{0}-norm (SL0) method
SL0 steps
● Initialization: |
1. Set ${\widehat{\mathbf{s}}}_{0}$ equal to the minimum ℓ_{2}-norm solution of A s=x, obtained by pseudo-inverse of A. |
2. Choose a suitable decreasing sequence for σ, $[{\sigma}_{1},\dots ,{\sigma}_{K}]$. |
● For $i=1,\dots ,K$: |
1. Set σ=σ_{ i }, |
2. Maximize the function F_{ σ }on the feasible set $\mathcal{S}=\left\{\mathbf{s}\right|\mathbf{A}\mathbf{s}=\mathbf{x}\}$ using L iterations of the steepest ascent algorithm (followed by projection onto the feasible set): |
−Initialization: $\mathbf{s}={\widehat{\mathbf{s}}}_{i-1}$. |
−for $j=1,\dots ,L$ (loop L times): |
(a) Let: $\Delta \mathbf{s}={[{s}_{1}{e}^{-\frac{{s}_{1}^{2}}{2{\sigma}^{2}}},\dots ,{s}_{n}{e}^{-\frac{{s}_{n}^{2}}{2{\sigma}^{2}}}]}^{T}$. |
(b) Set s ← s − μΔ s (where μ is a small positive constant). |
(c) Project s back onto the feasible set $\mathcal{S}$: $\begin{array}{c}\mathbf{s}\leftarrow \mathbf{s}-{\mathbf{A}}^{T}{\left(\mathbf{A}{\mathbf{A}}^{T}\right)}^{-1}\left(\mathbf{A}\mathbf{s}-\mathbf{x}\right)\end{array}$ |
3. Set ${\widehat{\mathbf{s}}}_{i}=\mathbf{s}$. |
● Final answer is $\widehat{\mathbf{s}}={\widehat{\mathbf{s}}}_{K}$ |
2.8 Comparison of different techniques
In Figure 5, the greedy algorithms, COSAMP and OMP, demonstrate better performances than ISTA and GPSR, especially at lower input signal SNRs. IMAT shows a better performance than all other algorithms; however its performance in the higher input signal SNRs is almost similar to OMP and COSAMP. In Figure 6, OMP and COSAMP have better performances than the other ones while ISTA, SL0, and GPSR have more or less the same performances. In sparse DFT signals, the complexity of the IMAT algorithm is less than the others while ISTA is the most complex algorithm. Similarly in Figure 6, SL0 has the least complexity.
3 Sampling: uniform, nonuniform, missing, random, compressed sensing, rate of innovation
Analog signals can be represented by finite rate discrete samples (uniform, nonuniform, or random) if the signal has some sort of redundancies such as band-limitedness, finite polynomial representation (e.g., periodic signals that are represented by a finite number of trigonometric polynomials), and nonlinear functions of such redundant functions [48, 49]. The minimum sampling rate is the Nyquist rate for uniform sampling and its generalizations for nonuniform [1] and multiband signals [50]. When a signal is discrete, the equivalent discrete representation in the “frequency” domain (DFT, DCT, DWT, Discrete Hartley Transform (DHT), Discrete Sine Transform (DST)) may be sparse, which is the discrete version of bandlimited or multiband analog signals where the locations of the bands are unknown.
For discrete signals, if the nonzero coefficients (“frequency” sparsity) are consecutive, depending on the location of the zeros, they are called lowpass, bandpass, or multiband discrete signals; if the locations of the nonzero coefficients do not follow any of these patterns, the “frequency” sparsity is random. The number of discrete time samples needed to represent a frequency-sparse signal with known sparsity pattern follows the law of algebra, i.e., the number of time samples should be equal to the number of coefficients in the “frequency” domain; since the two domains are related by a full rank transform matrix, recovery from the time samples is equivalent to solving an invertible k×k system of linear equations where k is the number of sparse coefficients. For band-limited real signals, the Fourier transform (sparsity domain) consists of similar nonzero patterns in both negative and positive frequencies where only the positive part is counted as the bandwidth; thus, the law of algebra is equivalent to the Nyquist rate, i.e., twice the bandwidth (for discrete signals with DC components it is twice the bandwidth minus one). The dual of frequency-sparsity is time-sparsity, which can happen in a burst or a random fashion. The number of “frequency” coefficients needed follows the Nyquist criterion. This will be further discussed in Section 4 for sparse additive impulsive noise channels.
3.1 Sampling of sparse signals
If the sparsity locations of a signal are known in a transform domain, then the number of samples needed in the time (space) domain should be at least equal to the number of sparse coefficients, i.e., the so-called Nyquist rate. However, depending on the type of sparsity (lowpass, bandpass, or random) and the type of sampling (uniform, periodic nonuniform, or random), the reconstruction may be unstable and the corresponding reconstruction matrix may be ill-conditioned [51, 52]. Thus in many applications discussed in Table 1, the sampling rate in column 6 is higher than the minimum (Nyquist) rate.
When the location of sparsity is not known, by the law of algebra, the number of samples needed to specify the sparsity is at least twice the number of sparse coefficients. Again for stability reasons, the actual sampling rate is higher than this minimum figure [1, 50]. To guarantee stability, instead of direct sampling of the signal, a combination of the samples can be used. Donoho has recently shown that if we take linear combinations of the samples, the minimum stable sampling rate is of the order $O(klog(\frac{n}{k}\left)\right)$, where n and k are the frame size and the sparsity order, respectively [29].
3.1.1 Reconstruction algorithms
There are many reconstruction algorithms that can be used depending on the sparsity pattern, uniform or random sampling, complexity issues, and sensitivity to quantization and additive noise [53, 54]. Among these methods are LP, lagrange interpolation [55], time varying method [56], spline interpolation [57], matrix inversion [58], error locator polynomial (ELP) [59], iterative techniques [52, 60–65], and IMAT [25, 31, 66, 67]. In the following, we will only concentrate on the last three methods as well as the first (LP) that have been proven to be effective and practical.
Iterative methods when the location of sparsity is known
The iterative algorithm based on the block diagram of Figure 7
1. Take the transform (e.g. the Fourier transform) of the input to the i^{ th }iteration (x^{(i)}) and denote it as X^{(i)}; x^{(0)} is normally the initial received signal. |
2. Multiply X^{(i)} by a mask (for instance a band-limiting filter). |
3. Take the inverse transform of the result in step 2 to get r^{(i)}. |
4. Set the new result as: x^{(i + 1)}=x^{(0)} + x^{(i)}−r^{(i)}. |
5. Repeat for a given number of iterations. |
6. Stop when $\parallel {\mathbf{x}}^{(i+1)}-{\mathbf{x}}^{\left(i\right)}{\parallel}_{{\ell}_{2}}<\epsilon $. |
Iterative methods are quite robust against quantization and additive noise. In fact, we can prove that the iterative methods approach the pseudo-inverse (least squares) solution for a noisy environment; specially, when the matrix is ill-conditioned [50].
Iterative method with adaptive threshold (IMAT) for unknown location of sparsity
Generic IMAT of Figure 9 for any sparsity in the DT, which is typically DFT
1. Use the all-zero block as the initial value of the sparse domain signal (0^{ th } iteration) |
2. Convert the current estimate of the signal in the sparse domain into the information domain (for instance the time domain into the Fourier domain) |
3. Where possible, replace the values with the known samples of the signal in the information domain. |
4. Convert the signal back to the sparse domain. |
5. Use adaptive hard thresholding to distinguish the original nonzero samples. |
6. If neither the maximum number of iterations has past nor a given stopping condition is fulfilled, return to the 2nd step. |
Matrix solutions
When the sparse nonzero locations are known, matrix approaches can be utilized to determine the values of sparse coefficients [58]. Although these methods are rather straightforward, they may not be robust against quantization or additive noise when the matrices are ill conditioned.
There are other approaches such as Spline interpolation [57], nonlinear/time varying methods [58], Lagrange interpolation [55] and error locator polynomial (ELP) [74] that will not be discussed here. However, the ELP approach will be discussed in Section 4.1; variations of this method are called the annihilating filter in sampling with finite rate of innovation (Section 3.3) and Prony’s method in spectral and DOA estimation (Section 5.1). These methods work quite well in the absence of additive noise but they may not be robust in the presence of noise. In the case of additive noise, the extensions of the Prony method (ELP) such as Pisarenko harmonic decomposition (PHD), MUSIC and Estimation of signal parameters via rotational invariance techniques (ESPRIT) will be discussed in Sections 5.2, 5.3, and 6.
3.2 Compressed sensing (CS)
The relatively new topic of CS (Compressive) for sparse signals was originally introduced in [29, 75] and further extended in [30, 76, 77]. The idea is to introduce sampling schemes with low number of required samples which uniquely represent the original sparse signal; these methods have lower computational complexities than the traditional techniques that employ oversampling and then apply compression. In other words, compression is achieved exactly at the time of sampling. Unlike the classical sampling theorem [78] based on the Fourier transform, the signals are assumed to be sparse in an arbitrary transform domain. Furthermore, there is no restricting assumption for the locations of nonzero coefficients in the sparsity domain; i.e., the locations should not follow a specific pattern such as lowpass or multiband structure. Clearly, this assumption includes a more general class of signals than the ones previously studied.
Since the concept of sparsity in a transform domain is more convenient to study for discrete signals, most of the research in this field is focused along discrete type signals [79]; however, recent results [80] show that most of the work can be generalized to continuous signals in shift-invariant subspaces (a subclass of the signals which are represented by Riesz basis).^{c} We first study discrete signals and then briefly discuss the extension to the continuous case.
3.2.1 CS mathematical modeling
where s is an n×1vector which has at most k non-zero elements (k-sparse vectors). In practical cases, s has at most k significant elements and the insignificant elements are set to zero which means s is an almost k-sparse vector. For example, x can be the pixels of an image and Ψ can be the corresponding IDCT matrix. In this case, most of the DCT coefficients are insignificant and if they are set to zero, the quality of the image will not degrade significantly. In fact, this is the main concept behind some of the lossy compression methods such as JPEG. Since the inverse transform on x yields s, the vector s can be used instead of x, which can be succinctly represented by the locations and values of the nonzero elements of s. Although this method efficiently compresses x, it initially requires all the samples of x to produce s, which undermines the whole purpose of CS.
The question is how the matrix Φ and the size m should be chosen to ensure that these samples uniquely represent the original signal x. Obviously, the case of Φ= I_{n×n}where I_{n×n} is an n×n identity matrix yields a trivial solution (keeping all the samples of x) that does not employ the sparsity condition. We look for Φ matrices with as few rows as possible which can guarantee the invertibility, stability, and robustness of the sampling process for the class of sparse inputs.
To solve this problem, we introduce probabilistic measures; i.e., instead of exact recovery of signals, we focus on the probability that a random sparse signal (according to a given probability density function) fails to be reconstructed using its generalized samples. If the probability δ of failure can be made arbitrarily small, then the sampling scheme (the joint pair of Ψ,Φ) is successful in recovering x with probability 1−δ, i.e., with high probability.
The above derivation implies that the smaller the maximum coherence between the two matrices, and the lower is the number of required samples. Thus, to decrease the number of samples, we should look for matrices Φ with low coherence with Ψ. For this purpose, we use a random Φ. It is shown that the coherence of a random matrix with i.i.d. Gaussian distribution with any unitary Ψ is considerably small [29], which makes it a proper candidate for the sampling matrix. Investigation of the probability distribution has shown that the Gaussian PDF is not the only solution (for example binary Bernouli distribution and other types are considered in [83]) but may be the simplest to analyze.
Notice that the required number of samples given in (20) is for random sampling of an orthonormal basis while (21) represents the required number of samples with i.i.d. Gaussian distributed sampling matrix. Typically, the number in (21) is less than that of (20).
3.2.2 Reconstruction from compressed measurements
In this section, we consider reconstruction algorithms and the stability robustness issues. We briefly discuss the following three methods: a—geometric, b—combinatorial, and c—information theoretic. The first two methods are standard while the last one is more recent.
Geometric methods
The interesting part is that the number of required samples to replace ℓ_{0} with ℓ_{1}-minimization has the same order of magnitude as the one for the invertibility of the sampling scheme. Hence, s can be derived from (22) using ℓ_{1}-minimization. It is worthwhile to mention that replacement of ℓ_{1}-norm with ℓ_{2}-norm, which is faster to implement, does not necessarily produce reasonable solutions. However, there are greedy methods (Matching Pursuit as discussed in Section 7 on SCA [40, 88]) which iteratively approach the best solution and compete with the ℓ_{1}-norm optimization (equivalent to Basis Pursuit methods as discussed in Section 7 on SCA).
A sufficient condition for these methods to work is that the matrix Φ·Ψ must satisfy the so-called restricted isometric property (RIP) [75, 83, 90]; which will be discussed in the following section.
Restricted isometric property
where 0≤δ_{ k }<1(isometry constant). The RIP is a sufficient condition that provides us with the maximum and minimum power of the samples with respect to the input power and ensures that none of the k-sparse inputs fall in the null space of the sampling matrix. The RIP property essentially states that every k columns of the matrix Φ_{m×n}must be almost orthonormal (these submatrices preserve the norm within the constants 1±δ_{ k }). The explicit construction of a matrix with such a property is difficult for any given n,k and m≈k logn; however, the problem has been studied in some cases [37, 92]. Moreover, given such a matrix Φ, the evaluation of s (or alternatively x) via the minimization problem involves numerical methods (e.g., linear programming, GPSR, SPGL1, FPC [44, 93]) for n variables and m constraints which can be computationally expensive.
However, probabilistic methods can be used to construct m×n matrices satisfying the RIP property for a given n,k and m≈k logn. This can be achieved using Gaussian random matrices. If Φ is a sample of a Gaussian random matrix with the number of rows satisfying (20), Φ·Ψ is also a sample of a Gaussian random matrix with the same number of rows and thus it satisfies RIP with high probability. Using matrices with the appropriate RIP property in the ℓ_{1}-minimization, we guarantee exact recovery of k-sparse signals that are stable and robust against additive noise.
This shows that small perturbations in the measurements cause small perturbations in the output of the ℓ_{1}-minimization method (robustness).
Combinatorial
Another standard approach for reconstruction of compressed sampling is combinatorial. As before, without loss of generality Ψ = I. The sampling matrix Φ is found using a bipartite graph which consists of binary entries, i.e., entries that are either 1 or 0. Binary search methods are then used to find an unknown k-sparse vector $\mathbf{s}\in {\mathbb{R}}^{n}$, see, e.g., [84, 94–100] and the references therein. Typically, the binary matrix Φ has m = O(k logn) rows, and there exist fast algorithms for finding the solution x from the m measurements (typically a linear combination). However, the construction of Φ is also difficult.
Information theoretic
A more recent approach is adaptive and information theoretic [101]. In this method, the signal $\mathbf{s}\in {\mathbb{R}}^{n}$ is assumed to be an instance of a vector random variable $\mathbf{s}={({s}_{1},\dots ,{s}_{n})}^{t}$, where (.)^{ t } denotes transpose operator, and the ith row of Φ is constructed using the value of the previous sample y_{i−1}. Tools from the theory of Huffman coding are used to develop a deterministic construction of a sequence of binary sampling vectors (i.e., their components consist of 0 or 1) in such a way as to minimize the average number of samples (rows of Φ) needed to determine a signal. In this method, the construction of the sampling vectors can always be obtained. Moreover, it is proved that the expected total cost (number of measurements and reconstruction combined) needed to sample and reconstruct a k-sparse vector in R^{ n } is no more than k logn + 2k.
Sampling with finite rate of innovation
where Φ(Ω) denotes the Fourier transform of φ(t), and the superscript (r) represents the r^{ th } derivative. It is also shown that such functions are of the form φ(t) = f(t)∗β_{2k}(t), where β_{2k}(t) is the B-spline of order 2k^{ th }and f(t)is an arbitrary function with nonzero DC frequency [102]. Therefore, the function β_{2k}(t) is itself among the possible options for the choice of φ(t).
In other words, we have filtered the discrete samples (y [j]) in order to obtain the values τ_{ r }; (35) shows that these values are only a function of the innovation parameters (amplitudes c_{ i } and time instants t_{ i }). However, the values τ_{ r }are nonlinearly related to the time instants and therefore, the innovations cannot be extracted from τ_{ r } using linear algebra.^{d}However, these nonlinear equations form a well-known system which was studied by Prony in the field of spectral estimation (see Section 5.1) and its discrete version is also employed in both real and Galois field versions of Reed-Solomon codes (see Section 4.1). This method which is called the annihilating filter is as follows:
By solving the above linear system of equations, we obtain coefficients h_{ i } (for a discussion on invertibility of the left side matrix see [102, 104]) and consequently, by finding the roots of H(z), the time instants will be revealed. It should be mentioned that the choice of ${\tau}_{1},\dots ,{\tau}_{2k}$ in (37) can be replaced with any 2k consecutive terms of {τ_{ i }}. After determining {t_{ i }}, (35) becomes a linear system of equations with respect to the values {c_{ i }} which could be easily solved.
This reconstruction method can be used for other types of signals satisfying (30) such as the signals represented by piecewise polynomials [102] (for large enough n, the n^{ th }derivative of these signals become delta functions). An important issue in nonlinear reconstruction is the noise analysis; for the purpose of denoising and performance under additive noise the reader is encouraged to see [27].
A nice application of sampling theory and the concept of sparsity is error correction codes for real and complex numbers [105]. In the next section, we shall see that similar methods can be employed for decoding block and convolutional codes.
4 Error correction codes: Galois and real/complex fields
The relation between sampling and channel coding is the result of the fact that over-sampling creates redundancy [105]. This redundancy can be used to correct for “sparse” impulsive noise. Normally, the channel encoding is performed in finite Galois fields as opposed to real/complex fields; the reason is the simplicity of logic circuit implementation and insensitivity to the pattern of errors. On the other hand, the real/complex field implementation of error correction codes has stability problems with respect to the pattern of impulsive, quantization and additive noise [52, 59, 74, 106–109]. Nevertheless, such implementation has found applications in fault tolerant computer systems [110–114] and impulsive noise removal from 1-D and 2-D signals [31, 32]. Similar to finite Galois fields, real/complex field codes can be implemented in both block and convolutional fashions.
A discrete real-field block code is an oversampled signal with n samples such that, in the transform domain (e.g., DFT), a contiguous number of high-frequency components are zero. In general, the zeros do not have to be the high-frequency components or contiguous. However, if they are contiguous, the resultant m equations (from the syndrome information domain) and m unknown erasures form a Vandermonde matrix, which ensures invertibility and consequently erasure recovery. The DFT block codes are thus a special case of Reed-Solomon (RS) codes in the field of real/complex numbers [105].
4.1 Decoding of block codes—ELP method
where h_{ k }s are the ELP coefficients as defined in (36) and Appendix Appendix 1, r is a member of the complement of Λ, and the index additions are in mod(n). After finding E [j] values, the spectrum of the recovered oversampled signal X [j]can be found by removing E [j]from the received signal (see (99) in Appendix Appendix 1). Hence the original signal can be recovered by removing the inserted zeros at the syndrome positions of X [j]. The above algorithm, called the ELP algorithm, is capable of correcting any combination of erasures. However, if the erasures are bursty, the above algorithm may become unstable. To combat bursty erasures, we can use the Sorted DFT (SDFT^{f}) [1, 59, 116, 117] instead of the conventional DFT. The simulation results for block codes with erasure and impulsive noise channels are given in the following two subsections.
4.1.1 Simulation results for erasure channels
Since consecutive sample losses represent the worst case [59, 116], the proposed method works better for random samples. In practice, the error recovery capability of this technique degrades with the increase of the block and/or burst size due to the accumulation of round-off errors. In order to reduce the round-off error, instead of the DFT, a transform based on the SDFT, or Sorted DCT (SDCT) can be used [1, 59, 116]. These types of transformations act as an interleaver to break down the bursty erasures.
4.1.2 Simulation results for random impulsive noise channel
There are several methods to determine the number, locations, and values of the impulsive noise samples, namely Modified Berlekamp-Massey for real fields [118, 119], ELP, IMAT, and constant false alarm rate with recursive detection estimation (CFAR-RDE). The Berlekamp-Massey method for real numbers is sensitive to noise and will not be discussed here [118]. The other methods are discussed below.
ELP method [104]
When the number and positions of the impulsive noise samples are not known, h_{ t } in (38) is not known for any t; therefore, we assume the maximum possible number of impulsive noise samples per block, i.e., $k=\lfloor \frac{n-l}{2}\rfloor $ as given in (96) in Appendix Appendix 1. To solve for h_{ t }, we need to know only n−l samples of E in the positions where zeros are added in the encoding procedure. Once the values of h_{ t } are determined from the pseudo-inverse [104], the number and positions of impulsive noise can be found from (98) in Appendix Appendix 1. The actual values of the impulsive noise can be determined from (38) as in the erasure channel case. For the actual algorithm, please refer to Appendix Appendix 2. As we are using the above method in the field of real numbers, exact zeros of {h_{ k }}, which are the DFT of {h_{ i }}, are rarely observed; consequently, the zeros can be found by thresholding the magnitudes of h_{ k }. Alternatively, the magnitudes of h_{ k }can be used as a mask for soft-decision; in this case, thresholding is not needed.
CFAR-RDE and IMAT methods [31]
4.2 Decoding for convolutional codes
The input signal is taken from a uniform random distribution of size 50 and the simulations are run 1,000 times and then averaged. The following subsections describe the simulation results for erasure and impulsive noise channels.
4.2.1 Decoding for erasure channels
An iterative decoding scheme for this matrix representation is similar to that of Figure 7 except that the operator G consists of the generator matrix, a mask (erasure operation), and the transpose of the generator matrix. If the rate of erasure does not exceed the encoder full capacity, the matrix form of the operator G can be shown to be a nonnegative definite square matrix and therefore its inverse exists [51, 60].
Figure 14 shows that the SNR values gradually decrease as the rate of erasure reaches its maximum (capacity).
4.2.2 Decoding for impulsive noise channels
Let us consider x and y as the input and the output streams of the encoder, respectively, related to each other through the generator matrix G as y = G x.
For simulation results, we use the generator matrix shown in (40), which can be calculated from [4].
In our simulations, the locations of the impulsive noise samples are generated randomly and their amplitudes have Gaussian distributions with zero mean and variance equal to 1, 2, 5, and 10 times the variance of the encoder output. The results are shown in Figure 15 after 300 iterations. This figure shows that the high variance impulsive noise has a better performance.
5 Spectral estimation
In this section, we review some of the methods which are used to evaluate the frequency content of data [7–10]. In the field of signal spectrum estimation, there are several methods which are appropriate for different types of signals. Some methods are more suitable to estimate the spectrum of wideband signals, whereas some others are better for the extraction of narrow-band components. Since our focus is on sparse signals, it would be reasonable to assume sparsity in the frequency domain, i.e., we assume the signal to be a combination of several sinusoids plus white noise.
where m is the number of observations, T_{ s } is the sampling interval (usually assumed as unity), and x_{ r }is the signal. Although non-parametric methods are robust with low computational complexity, they suffer from fundamental limitations. The most important limitation is their resolution; too closely spaced harmonics cannot be distinguished if the spacing is smaller than the inverse of the observation period.
To overcome this resolution problem, parametric methods are devised. Assuming a statistical model with some unknown parameters, we can increase resolution by estimating the parameters from the data at the cost of more computational complexity. Theoretically, in parametric methods, we can resolve closely spaced harmonics with limited data length if the SNR goes to infinity.^{h}
In this section, we shall discuss three parametric approaches for spectral estimation: the Pisarenko, the Prony, and the MUSIC algorithms. The first two are mainly used in spectral estimation, while the MUSIC algorithm was first developed for array processing and later has been extended to spectral estimation. It should be noted that the parametric methods unlike the non-parametric approaches require prior knowledge of the model order (the number of tones). This can be decided from the data using the minimum discription length (MDL) method discussed in the next section.
5.1 Prony method
Basic prony algorithm
1. Solve the recursive equation in (47) to evaluate h_{ i }s. |
2. Find the roots of the polynomial represented in (46); these roots are the complex exponentials defined as z_{ i }in (44). |
3. Solve (44) to obtain the amplitudes of the exponentials (b_{ i }s). |
The Prony method is sensitive to noise, which was also observed in the ELP and the annihilating filter methods discussed in Sections 3.3 and 4.1. There are extended Prony methods that are better suited for noisy measurements [10].
5.2 Pisarenko harmonic decomposition (PHD)
which is the key equation of the Pisarenko method. The eigen-equation of (54) states that the elements of the eigenvector of the covariance matrix, corresponding to the smallest eigenvalue (σ^{2}), are the same as the coefficients in the recursive equation of x_{ r }(coefficients of the ARMA model in (49)). Therefore, by evaluating the roots of the polynomial represented in (46) with coefficients that are the elements of this vector, we can find the tones in the spectrum.
PHD algorithm
1. Given the model order k (number of sinusoids), find the autocorrelation matrix of the noisy observations with dimension k + 1(R_{ y y }). |
2. Find the smallest eigenvalue (σ^{2}) of R_{ y y }and the corresponding eigenvector (H). |
3. Set the elements of the obtained vector as the coefficients of the polynomial in (46). The roots of this polynomial are the estimated frequencies. |
A different formulation of the PHD method with linear programming approach (refer to Section 2.2 for description of linear programming) for array processing is studied in [121]. The PHD method is shown to be equivalent to a geometrical projection problem which can be solved using ℓ_{1}-norm optimization.
5.3 MUSIC
MUltiple SIgnal Classification (MUSIC), is a method originally devised for high-resolution source direction estimation in the context of array processing that will be discussed in the next section [122]. The inherent equivalence of array processing and time series analysis paves the way for the employment of this method in spectral estimation. MUSIC can be understood as a generalization and improvement of the Pisarenko method. It is known that in the context of array processing, MUSIC can attain the statistical efficiency^{i} in the limit of asymptotically large number of observations [11].
In the PHD method, we construct an autocorrelation matrix of dimension k + 1 under the assumption that its smallest eigenvalue (σ^{2}) belongs to the noise subspace. Then we use the Hermitian property of the covariance matrix to conclude that the noise eigenvector should be orthogonal to the signal eigenvectors. In MUSIC, we extend this method using a noise subspace of dimension greater than one to improve the performance. We also use some kind of averaging over noise eigenvectors to obtain a more reliable signal estimator.
where v_{ i }s are eigenvectors of R corresponding to the noise subspace.
MUSIC algorithm
1. Find the autocorrelation matrix of the noisy observations (R_{ y y }) with the available size as shown in (57). |
2. Using a given value of k or a method to determine k (such as MDL), separate the m − k smallest eigenvalues of R_{ y y }and the corresponding eigenvectors (${\mathbf{v}}_{k+1},\dots ,{\mathbf{v}}_{m}$). |
3. Use (58) to estimate the spectral content at frequency ω. |
6 Sparse array processing
There are three types of array processing: 1—estimation of multi-source location (MSL) and Direction of Arrival (DOA), 2—sparse array beam-forming and design, and 3—sparse sensor networks. The first topic is related to estimating the directions and/or the locations of multiple targets; this problem is very similar to the problem of spectral estimation dealt with in the previous section; the relations among sparsity, spectral estimation, and array processing were discussed in [123, 124]. The second topic is related to the design of sparse arrays with some missing and/or random array sensors. The last topic, depending on the type of sparsity, is either similar to the second topic or related to CS of sparse signal fields in a network. In the following, we will only consider the first kind.
6.1 Array processing for MSL and DOA estimation
Among the important fields of active research in array processing are MSL and DOA estimation [122, 125, 126]. In such schemes, a passive or active array of sensors is used to locate the sources of narrow-band signals. Some applications may assume far-field sources (e.g., radar signal processing) where the array is only capable of DOA estimation, while other applications (e.g. biomedical imaging systems) assume near-field sources where the array is capable of locating the sources of radiation. A closely related field of study is spectral estimation due to similar linear statistical models. The stochastic sparse signals pass through a partially known linear transform (e.g., array response or inverse Fourier transform) and are observed in a noisy environment.
Two main fields in array processing are MSL and DOA for estimating the source locations and directions, respectively; for both purposes, the angle of arrival (azimuth and elevation) should be estimated while for MSL an extra parameter of range is also needed. The simplest case is the 1-D ULA (azimuth-only) for DOA estimation.
In the field of DOA estimation, extensive research has been accomplished in (1) source enumeration, and (2) DOA estimation methods. Both of the subjects correspond to the determination of parameters k and φ.
Although some methods are proposed for simultaneous detection and estimation of the model statistical characteristics [127], most of the literature is devoted to two-stage approaches; first, the number of active sources is detected and then their directions are estimated by techniques such as estimation of signal parameters via rotational invariance techniques (ESPRIT)^{j}[128–132]. Usually, the joint detection-estimation methods outperform the two-stage approaches with the cost of higher computational complexity. In the following, we will describe Minimum Description Length (MDL) as a powerful tool to detect the number of active sources.
6.1.1 Minimum description length
One of the most successful methods in array processing for source enumeration is the use of the MDL criterion [133]. This technique is very powerful and outperforms its older versions including AIC [134–136]. Hence, we confine our discussion to MDL algorithms.
6.1.2 Preliminaries
where $P\left(t\right)={a}_{0}+{a}_{1}t+\cdots +{a}_{k}{t}^{k}$, ν(t)is the observed Gaussian noise and k is the unknown model order (degree of the polynomial P(t)) which determines the complexity. Clearly, m−1 is the maximum required order for unique description of the data (m observed samples), and the ML criterion always selects this maximum value (${\widehat{k}}_{\mathit{\text{ML}}}=m-1$); i.e., the ML method forces the polynomial P(t)to pass through all the points. MDL, on the other hand, yields a sparser solution (${\widehat{k}}_{\mathit{\text{MDL}}}<m-1$).
The first term is the ML term for data encoding, and the second term is a penalty function that inhibits the number of free parameters of the model to become very large.
Example of using MDL in spectral estimation
here ${\theta}_{k}={\{{a}_{j},{\omega}_{j},{\varphi}_{j}\}}_{j=1}^{k}$ are the unknown sinusoidal parameters to be estimated to compute the likelihood term in (65), which in this case is computed from (66). The 3k unidentified parameters are estimated by the grid search, i.e., all possible values of frequency and phase (amplitude can be estimated using the assumed frequency and phase by using this relation; $\widehat{{a}_{j}}=\frac{{\Sigma}_{t}x\left(t\right)sin(\widehat{{\omega}_{j}}t+\widehat{\varphi})}{{\Sigma}_{t}{\left(x\right(t)sin(\widehat{{\omega}_{j}}t+\widehat{\varphi}\left)\right)}^{2}}$[140] are tested and the one maximizing the likelihood function (66) is selected as the best estimate.
To find the number of embedded sinusoids in the noisy observed data, it is initially assumed that k=0 and (65) is calculated, then k is increased and by using the grid search, the maximum value of the likelihood for the assumed k is calculated from (66), and this calculated value is then used to compute (65). This procedure should be followed as long as (65) decreases and consequently aborted when it starts to rise. The k minimizing (65) is the k selected by MDL method and hopefully reveals the true number of the sinusoids in the noisy observed data. It is obvious that the sparsity condition, i.e., k<<n, is necessary for the efficient operation of MDL. In addition to the number of sinusoids, MDL has apparently estimated the frequency, amplitude, and phase of the embedded sinusoids. This should make it clear why such methods are called detection–estimation algorithms.
The very same method can be used to find the number, position, and amplitude of an impulsive noise added to a low-pass signal in additive noise. If the samples of the added impulsive noise are statistically independent from each other, the high-pass samples of the discrete fourier transform (DFT) of the noisy observed data with impulsive noise should be taken and the same method applied.
MDL source enumeration
In fact, since we know that R has a spherical subspace of dimension n−k, we correct the observed $\widehat{\mathbf{R}}$ to obtain R_{ ML }.
where the last integer 1 can be omitted since it is independent of k.
The two-part MDL, despite its very low computational complexity, is among the most successful methods for source enumeration in array processing. Nonetheless, this method does not reach the best attainable performance for finite number of measurements [142]. The new version of MDL, called one-part or Refined MDL has improved the performance for the cases of finite measurements which has not been applied to the array processing problem [33].
6.2 Sparse sensor networks
- (1)
There exists a central node known as the fusion center (FC) that retrieves relevant field information from the sensor nodes and communication from the sensor nodes to FC generally takes place over a power- and bandwidth-constrained wireless channel.
- (2)
Such a central node does not exist and the nodes take specific decisions based on the information they obtain and exchange among themselves. Issues such as distributed computing and processing are of high importance in such scenarios.
In general, there are three main tasks that should be implemented efficiently in a wireless sensor network: sensing, communication, and processing. The main challenge in design of practical sensor networks is to find an efficient way of jointly performing these tasks, while using the minimum amount of system resources (computation, power, bandwidth) and satisfying the required system design parameters (such as distortion levels). For example, one such metric is the so-called energy-distortion tradeoff which determines how much energy the sensor network consumes in extracting and delivering relevant information up to a given distortion level. Although many theoretical results are already available in the case of point-to-point links in which separation between source and channel coding can be assumed, the problem of efficiently transmitting or sharing information among a vast number of distributed nodes remains a great challenge. This is due to the fact that well-developed theories and tools for distributed signal processing, communications, and information theory in large-scale networked systems are still under development. However, recent results on distributed estimation or detection indicate that joint optimization through some form of source-channel matching and local node cooperation can result in significant system performance improvement [143–147].
6.2.1 How sparsity can be exploited in a sensor network
- (1)
Sparsity of node distribution in spatial terms
- (2)
Sparsity of the field to be estimated
Although nodes in a sensor network can be assumed to be regularly deployed in a given environment, such an assumption is not valid in many practical scenarios. Therefore, the non-uniform distribution of nodes can lead to some type of sparsity in spatial domain that can be exploited to reduce the amount of sensing, processing, and/or communication. This issue is subsequently related to extensions of the nonuniform sampling techniques to two-dimensional domains through proper interpolation and data recovery when samples are spatially sparse [34, 150]. The second scenario that provides a proper basis for exploiting the sparsity concepts arises when the field to be estimated is a sparse multi-dimensional signal. From this point of view, ideas such as those presented earlier in the context of compressed sensing (Section 3.2) provide the proper framework to address the sparsity in such fields.
Spatial sparsity and interpolation in sensor networks
Although general 2-D interpolation techniques are well-known in various branches of statistics and signal processing, the main issue in a sensor network is exploring proper spatio/temporal interpolation such that communication and processing are also efficiently accomplished. While there is a wide range of interpolation schemes (polynomial, Fourier, and least squares [151]), many of these schemes are not directly applicable for spatial interpolation in sensor networks due to their communication complexity.
Another characteristic of many sensor networks is the non-uniformity of node distribution in the measurement field. Although non-uniformity has been dealt with extensively in contexts such as signal processing, geo-spatial data processing, and computational geometry [1], the combination of irregular sensor data sampling and intra-network processing is a main challenge in sensor networks. For example, reference [152] addresses the issue of spatio-temporal non-uniformity in sensor networks and how it impacts performance aspects of a sensor network such as compression efficiency and routing overhead. In order to reduce the impact of non-uniformity, the authors in [152] propose using a combination of spatial data interpolation and temporal signal segmentation. A simple interpolation wavelet transform for irregular sampling which is an extension of the 2-D irregular grid transform to 3-D spatio-temporal transform grids is also proposed in [153]. Such a multi-scale transform extends the approach in [154] and removes the dependence on building a distributed mesh within the network. It should be noted that although wavelet compression allows the network to trade reconstruction quality for communication energy and bandwidth usage, such energy savings are naturally offset by the overhead cost of computing the wavelet coefficients.
Distributed wavelet processing within sensor networks is yet another approach to reduce communication energy and wireless bandwidth usage. Use of such distributed processing makes it possible to trade long-haul transmission of raw data to the FC for less costly local communication and processing among neighboring nodes [153]. In addition, local collaboration among nodes decorrelates measurements and results in a sparser data set.
Compressive sensing in sensor networks
Most natural phenomena in SNs are compressible through representation in a natural basis [86]. Some examples of these applications are imaging in a scattering medium [148], MIMO radar [149], and geo-exploration via underground seismic data. In such cases, it is possible to construct a highly compressed version of a given field, in a decentralized fashion. If the correlations between data at different nodes are known a-priori, it is possible to use schemes that have very favorable power-distortion-latency tradeoffs [143, 155, 156]. In such cases, distributed source coding techniques, such as Slepian-Wolf coding, can be used to design compression schemes without collaboration between nodes (see [155] and the references therein). Since prior knowledge of such correlations is not available in many applications, collaborative, intra-network processing and compression are used to determine unknown correlations and dependencies through information exchange between network nodes. In this regard, the concept of compressive wireless sensing has been introduced in [147] for energy-efficient estimation at the FC of sensor data, based on ideas from wireless communications [143, 145, 156–158] and compressive sampling theory [29, 75, 159]. The main objective in such an approach is to combine processing and communications in a single distributed operation [160–162].
Methods to obtain the required sparsity in a SN
While transform-based compression is well-developed in traditional signal and image processing domains, the understanding of sparse transforms for networked data is not as trivial [163]. There are methods such as associating a graph with a given network, where the vertices of the graph represent the nodes of the network, and edges between vertices represent relationships among data at adjacent nodes. The structure of the connectivity is the key to obtaining effective sparse transformations for networked data [163]. For example, in the case of uniformly distributed nodes, tools such as DFT or DCT can be adopted to exploit the sparsity in the frequency domain. In more general settings, wavelet techniques can be extended to handle the irregular distribution of sampling locations [153]. There are also scenarios in which standard signal transforms may not be directly applicable. For example, network monitoring applications rely on the analysis of communication traffic levels at the network nodes where network topology affects the nature of node relationships in complex ways. Graph wavelets [164] and diffusion wavelets [165] are two classes of transforms that have been proposed to address such complexities. In the former case, the wavelet coefficients are obtained by computing the digital differences of the data at different scales. The coefficients at the first scale are differences between neighboring data points, and those at subsequent spatial scales are computed by first aggregating data in neighborhoods and then computing differences between neighboring aggregations. The resulting graph wavelet coefficients are then defined by aggregated data at different scales and computing differences between the aggregated data [164]. In the latter scheme, diffusion wavelets are based on construction of an orthonormal basis for functions supported on a graph and obtaining a custom-designed basis by analyzing eigenvectors of a diffusion matrix derived from the graph adjacency matrix. The resulting basis vectors are generally localized to neighborhoods of varying size and may also lead to sparse representations of data on a graph [165]. One example of such an approach is where the node data correspond to traffic rates of routers in a computer network.
Implementation of CS in a wireless SN
In the second approach, the projections can be computed and delivered to every subset of nodes in the network using gossip/consensus techniques, or be delivered to a single point using clustering and aggregation. This approach is typically used for networked data storage and retrieval applications. In this method, computation and distribution of each CS sample is accomplished through two simple steps [163]. In the first step, each of the sensors multiplies its data with the corresponding element of the compressing matrix. Then, in the second step, the resulting local terms are simultaneously aggregated and distributed across the network using randomized gossip [167], which is a simple iterative decentralized algorithm for computing linear functions. Because each node only exchanges information with its immediate neighbors in the network, gossip algorithms are more robust to failures or changes in the network topology and cannot be easily compromised by eliminating a single server or fusion center [168].
Finally, it should be noted that in addition to the encoding process, the overall system performance is significantly affected by the decoding process [44, 88, 169]; this study and its extensions to sparse SNs remain as challenging tasks.
6.2.2 Sensing capacity
Despite wide-spread development of SN ideas in recent years, understanding of fundamental performance limits of sensing and communication between sensors is still under development. One of the issues that has recently attracted attention in theoretical analysis of sensor networks is the concept of sensor capacity. The sensing capacity was initially introduced for discrete alphabets in applications such as target detection [170] and later extended in [14, 171, 172] to the continuous case. The questions in this area are related to the problem of sampling of sparse signals, [29, 76, 159] and sampling with finite rate of innovation [3, 103]. In the context of the CS, sensing capacity provides bounds on the maximum signal dimension or complexity per sensor measurement that can be recovered to a pre-defined degree of accuracy. Alternatively, it can be interpreted as the minimum number of sensors necessary to monitor a given region to a desired degree of fidelity based on noisy sensor measurements. The inverse of sensing capacity is the compression rate; i.e., the ratio of the number of measurements to the number of signal dimensions which characterizes the minimum rate to which the source can be compressed. As shown in [14], sensing capacity is a function of SNR, the inherent dimensionality of the information space, sensing diversity, and the desired distortion level.
Another issue to be noted with respect to the sensing capacity is the inherent difference between sensor network and CS scenarios in the way in which the SNR is handled [14, 172]. In sensor networks composed of many sensors, fixed SNR can be imposed for each individual sensor. Thus, the sensed SNR per location is spread across the field of view leading to a row-wise normalization of the observation matrix. On the other hand, in CS, the vector-valued observation corresponding to each signal component is normalized by each column. This difference has led to different regimes of compression rate [172]. In SN, in contrast to the CS setting, sensing capacity is generally small and correspondingly the number of sensors required does not scale linearly with the target sparsity. Specifically, the number of measurements is generally proportional to the signal dimension and is weakly dependent on target density sparsity. This issue has raised questions on compressive gains in power-limited SN applications based on sparsity of the underlying source domain.
7 Sparse component analysis: BSS and SDR
7.1 Introduction
Recovery of the original source signals from their mixtures, without having a priori information about the sources and the way they are mixed, is called blind source separation (BSS). This process is impossible if no assumption about the sources can be made. Such an assumption on the sources may be uncorrelatedness, statistical independence, lack of mutual information, or disjointness in some space [18, 19, 49].
The signal mixtures are often decomposed into their constituent principal components, independent components, or are separated based on their disjoint characteristics described in a suitable domain. In the latter case, the original sources should be sparse in that domain. Independent component analysis (ICA) is often used for separation of the sources in the former case, whereas SCA is employed for the latter case. These two mathematical tools are described in the following sections followed by some results and illustrations of their applications.
7.2 Independent component analysis (ICA)
The main assumption in ICA is the statistical independence of the constituent sources. Based on this assumption, ICA can play a crucial role in the separation and denoising of signals (BSS).
where y[i] is the estimate for the source signal s [i]. The early approaches in instantaneous BSS started from the work by Herault and Jutten [174] in 1986. In their approach, they considered non-Gaussian sources with equal number of independent sources and mixtures. They proposed a solution based on a recurrent artificial neural network for separation of the sources.
In the cases where the number of sources is known, any ambiguity caused by false estimation of the number of sources can be avoided. If the number of sources is unknown, a criterion may be established to estimate the number of sources beforehand. In the context of model identification, this is referred to as Model Order Selection and methods such as the final prediction error (FPE), AIC, residual variance (RV), MDL and Hannan and Quinn (HNQ) methods [175] may be considered to solve this problem.
where L denotes the maximum number of paths for the sources, ν_{ r }[i] is the accumulated noise at sensor r, and (.)^{ l }refers to the l^{ th }path. The unmixing process will be formulated similarly to the anechoic one. For a known number of sources, an accurate result may be expected if the number of paths is known; otherwise, the overall number of observations in an echoic case is infinite.
The aim of BSS using ICA is to estimate an unmixing matrix W such that Y=W X best approximates the independent sources s, where y and x are respectively matrices with columns $\mathbf{y}\left[i\right]={\left[{y}_{1}\left[i\right],\phantom{\rule{1em}{0ex}}{y}_{2}\left[i\right],\phantom{\rule{1em}{0ex}}\dots ,\phantom{\rule{1em}{0ex}}{y}_{n}\left[i\right]\right]}^{T}$ and $\mathbf{x}\left[i\right]={\left[{x}_{1}\left[i\right],\phantom{\rule{1em}{0ex}}{x}_{2}\left[i\right],\phantom{\rule{1em}{0ex}}\dots ,\phantom{\rule{1em}{0ex}}{x}_{m}\left[i\right]\right]}^{T}$. Thus the ICA separation algorithms are subject to permutation and scaling ambiguities in the output components, i.e. W=P D H^{−1}, where P and D are the permutation and scaling (diagonal) matrices, respectively. Permutation of the outputs is troublesome in places where either the separated segments of the signals are to be joined together or when a frequency-domain BSS is performed.
Mutual information is a measure of independence and maximizing the non-Gaussianity of the source signals is equivalent to minimizing the mutual information between them [177].
In those cases where the number of sources is more than the number of mixtures (underdetermined systems), the above BSS schemes cannot be applied simply because the mixing matrix is not invertible, and generally the original sources cannot be extracted. However, when the signals are sparse, the methods based on disjointness of the sources in some domain may be utilized. Separation of the mixtures of sparse signals is potentially possible in the situation where, at each sample instant, the number of nonzero sources is not more than a fraction of the number of sensors (see Table 1, row and column 6). The mixtures of sparse signals can also be instantaneous or convolutive.
7.3 Sparse component analysis (SCA)
where ν[i] is an m×1vector. A and C can be determined by optimization of a cost function based on an exponential distribution for c_{i,j}[178]. In places where the sources are sparse and at each time instant, at most one of the sources has significant nonzero value, the columns of the mixing matrix may be calculated individually, which makes the solution to the underdetermined case possible.
The SCA problem can be stated as a clustering problem since the lines in the scatter plot can be separated based on their directionalities by means of clustering. A number of works on this method have been reported [18, 179, 180]. In the work by Li et al. [180], the separation has been performed in two different stages. First, the unknown mixing matrix is estimated using the k-means clustering method. Then, the source matrix is estimated using a standard linear programming algorithm. The line orientation of a data set may be thought of as the direction of its greatest variance. One way is to perform eigenvector decomposition on the covariance matrix of the data, the resultant principal eigenvector, i.e., the eigenvector with the largest eigenvalue, indicates the direction of the data, since it has the maximum variance. In [179], GAP statistics as a metric which measures the distance between the total variance and cluster variances, has been used to estimate the number of sources followed by a similar method to Li’s algorithm explained above. In line with this approach, Bofill and Zibulevsky [15] developed a potential function method for estimating the mixing matrix followed by ℓ_{1}-norm decomposition for the source estimation. Local maxima of the potential function correspond to the estimated directions of the basis vectors. After the mixing matrix is identified, the sources have to be estimated. Even when A is known, the solution is not unique. So, a solution is found for which the ℓ_{1}-norm is minimized. Therefore, for $\mathbf{x}\left[i\right]=\sum {\mathbf{a}}_{j}{s}_{j}\left[i\right]$, $\sum _{j}\left|{s}_{j}\right|$ is minimized using linear programming.
There are many cases for which the sources are disjoint in other domains, rather than the time-domain, or when they can be represented as sum of the members of a dictionary which can consist for example of wavelets or wavelet packets. In these cases the SCA can be performed in those domains more efficiently. Such methods often include transformation to time-frequency domain followed by a binary masking [181] or a BSS followed by binary masking [176]. One such approach, called degenerate unmixing estimation technique (DUET) [181], transforms the anechoic convolutive observations into the time-frequency domain using a short-time Fourier transform and the relative attenuation and delay values between the two observations are calculated from the ratio of corresponding time-frequency points. The regions of significant amplitudes (atoms) are then considered to be the source components in the time-frequency domain. In this method only two mixtures have been considered and as a major limit of this method, only one source has been considered active at each time instant.
A detailed discussion of signal recovery using ℓ_{1}-norm minimization is presented by Takigawa et al. [183] and described below. As mentioned above, it is important to choose a domain that sparsely represents the signals.
On the other hand, in the method developed by Pedersen et al. [176], as applied to stereo signals, the binary masks are estimated after BSS of the mixtures and then applied to the microphone signals. The same technique has been used for convolutive sparse mixtures after the signals are transformed to the frequency domain.
In another approach [184], the effect of outlier noise has been reduced using median filtering then hybrid fast ICA filtering, and ℓ_{1}-norm minimization have been used for separation of temporomandibular joint sounds. It has been shown that for such sources, this method outperforms both DUET and Li’s algorithms. The authors of [185] have recently extended the DUET algorithm to separation of more than two sources in an echoic mixing scenario in the time-frequency domain.
In a very recent approach, it has been considered that brain signal sources in the space-time frequency domain are disjoint. Therefore, clustering the observation points in the space-time-frequency-domain can be effectively used for separation of brain sources [186].
As it can be seen, generally, BSS exploits independence of the source signals, whereas SCA benefits from the disjointness property of the source signals in some domain. While the BSS algorithms mostly rely on ICA with statistical properties of the signals, SCA uses their geometrical and behavioral properties. Therefore, in SCA, either a clustering approach or a masking procedure can result in estimation of the mixing matrix. Often, an ℓ_{1}-norm is used to recover the source signals. Generally, in places where the source signals are sparse, the SCA methods often result in more accurate estimation of the signals with less ambiguities in the estimation.
7.4 SCA algorithms
SCA steps
1. Consider the model x=A·s; we need a linear transformation that applies to both sides of the equation to yield a new sparse source vector. |
2. Estimate the mixing matrix A. Several approaches are presented for this step, such as natural gradient ICA approaches, and clustering techniques with variants of k-means algorithm [18, 187]. |
3. Estimate the source representation based on the sparsity assumption. A majority of proposed methods are primarily based on minimizing some norm or pseudo-norm of the source representation vector. The most effective approaches are Matching Pursuit [38, 187], Basis Pursuit, [85, 178, 188, 189], FOCUSS [46], IDE [73] and Smoothed ℓ_{0}-norm [47]. |
A brief review of major approaches that are suggested for the third step was given in Section 2.
7.5 Sparse dictionary representation (SDR) and signal modeling
A signal $\mathbf{x}\in {\mathbb{R}}^{n}$ may be sparse in a given basis but not sparse in a different basis. For example, an image may be sparse in a wavelet basis (i.e., most of the wavelet coefficients are small) even though the image itself may not be sparse (i.e., many of the gray values of the image are relatively large). Thus, given a class $\mathcal{S}\subset {\mathbb{R}}^{n}$, an important problem is to find a basis or a frame in which all signals in $\mathcal{S}$ can be represented sparsely. More specifically, given a class of signals $\mathcal{S}\subset {\mathbb{R}}^{n}$, it is important to find a basis (or a frame) $D={\left\{{w}_{j}\right\}}_{j=1}^{d}$ (if it exists) for ${\mathbb{R}}^{n}$ such that every data vector $\mathbf{x}\in \mathcal{S}$ can be represented by at most k≪n linear combinations of elements of D. The dictionary design problem has been addressed in [18–20, 40, 75, 190]. A related problem is the signal modeling problem in which the class $\mathcal{S}$ is to be modeled by a union of subspaces $\mathcal{M}=\bigcup _{i=1}^{l}{V}_{i}$ where each V_{ i } is a subspace of ${\mathbb{R}}^{n}$ with the dimension of V_{ i }≤k where k≪n[49]. If the subspaces V_{ i } are known, then it is possible to pick a basis ${E}^{i}={\left\{{e}_{j}^{i}\right\}}_{j}$ for each V_{ i } and construct a dictionary $D=\bigcup _{i=1}^{l}{E}^{i}$ in which every signal of $\mathcal{S}$ has sparsity k (or is almost k sparse). The model $\mathcal{M}=\bigcup _{i=1}^{l}{V}_{i}$ can be found from an observed set of data $F=\{{f}_{1},\dots ,{f}_{m}\}\subset \mathcal{S}$ by solving (if possible) the following non-linear least squares problem:
Search algorithm
● Input: |
−initial partition $\{{F}_{1}^{1},\dots ,{F}_{l}^{1}\}$ |
−Data set $\mathcal{F}$ |
● Iterations: |
1. Use the SVD to find $\{{V}_{1}^{1},\dots ,{V}_{l}^{1}\}$ by minimizing $e\left({F}_{i}^{1},{V}_{i}^{1}\right)$ for each i, and compute ${\Gamma}_{1}=\sum _{i}e\left({F}_{i}^{1},{V}_{i}^{1}\right)$; |
2. Set j=1; |
3. While${\Gamma}_{j}=\sum _{i}e\left({F}_{i}^{j},{V}_{i}^{j}\right)>e\left(\mathcal{F},\{{V}_{1}^{j},\dots ,{V}_{l}^{j}\}\right)$ |
4. Choose a new partition $\left\{{F}_{1}^{j+1},\dots ,{F}_{l}^{j+1}\right\}$ that satisfies, $f\in {F}_{k}^{j+1}$ implies that $d\left(f,{V}_{k}^{j}\right)\le d\left(f,{V}_{h}^{j}\right)$, $h=1,\dots ,l$; |
5. Use SVD to find and choose $\{{V}_{1}^{j+1},\dots ,{V}_{l}^{j+1}\}$, by minimizing $e\left({F}_{i}^{j+1},{V}_{i}\right)$ for each i, and compute ${\Gamma}_{j+1}=\sum _{i}e\left({F}_{i}^{j+1},{V}_{i}^{j+1}\right)$; |
6. Increment j by 1, i.e., j→j + 1; |
7. End while |
● Output: |
−$\{{F}_{1}^{j},\dots ,{F}_{l}^{j}\}$ and $\{{V}_{1}^{j},\dots ,{V}_{l}^{j}\}$. |
In some new attempts sparse representation and the compressive sensing concept have been extended to solving multichannel source separation [191–194]. In [191, 192] separation of sparse sources with different morphologies has been presented by developing a multichannel morphological component analysis approach. In this scheme, the signals are considered as combination of features from different dictionaries. Therefore, different dictionaries are assumed for different sources. In [193] inversion of a random field from pointwise measurements collected by a sensor network is presented. In this article, it is assumed that the field has a sparse representation in a known basis. To illustrate the approach, the inversion of an acoustic field created by the superposition of a discrete number of propagating noisy acoustic sources is considered. The method combines compressed sensing (sparse reconstruction by ℓ_{1}-constrained optimization) with distributed average consensus (mixing the pointwise sensor measurements by local communication among the sensors). [194] addresses source separation from a linear mixture under source sparsity and orthogonality of the mixing matrix assumptions. A two-stage separation process is proposed. In the first stage recovering a sparsity pattern of the sources is tried by exploiting the orthogonality prior. In the second stage, the support is used to reformulate the recovery task as an optimization problem. Then a solution based on alternating minimization for solving the above problems is suggested.
8 Multipath channel estimation
The estimation of the multipath channel impulse response is very much similar to the determination of analog epochs and amplitudes of discontinuities for finite rate of innovation as shown in (31). Essentially, if a known train of impulses is transmitted and the received signal from the multipath channel is filtered and sampled (information domain as discussed in Section 3.3), the channel impulse response can be estimated from these samples using an annihilating filter (the Prony or ELP method) [27] defined with the $\mathcal{Z}$-transform and a pseudo-inverse matrix inversion, in principle.^{m}Once the channel impulse response is estimated, its effect is compensated; this process can be repeated according to the dynamics of the time varying channel.
A special case of multipath channel is an OFDM channel, which is widely used in ADSL, DAB, DVB, WLAN, WMAN, and WIMAX.^{n}OFDM is a digital multi-carrier transmission technique where a single data stream is transmitted over several sub-carrier frequencies to achieve robustness against multipath channels as well as spectral efficiency [195]. Channel estimation for OFDM is relatively simple; the time instances of channel impulse response is now quantized and instead of an annihilating filter defined in the $\mathcal{Z}$-transform, we can use DFT and ELP of Section 4.1. Also, instead of a known train of impulses, some of the available sub-carriers in each transmitted symbol are assigned to predetermined patterns, which are usually called comb-type pilots. These pilot tones help the receiver to extract some of the DFT samples of the discrete time varying channel (84) at the respective frequencies in each transmitted symbol. These characteristics make the OFDM channel estimation similar to unknown sparse signal recovery of Section 3.1.1 and the impulsive noise removal of Section 4.1.2. Because of these advantages, our main example and simulations are related to OFDM channel estimation.
8.1 OFDM channel estimation
where T_{ f }and n are the symbol length (including cyclic prefix) and number of sub-carriers in each OFDM symbol, respectively. Δf is the sub-carrier spacing, and ${T}_{s}=\frac{1}{\mathrm{\Delta f}}$ is the sample interval. The above equation shows that for the r^{ th }OFDM symbol, H [r,i]is the DFT of h [r,l].
Two major methods are used in the equalization process [196]: (1) zero forcing and (2) minimun mean squared error (MMSE). In the zero forcing method, regardless of the noise variance, equalization is obtained by dividing the received OFDM symbol by the estimated channel frequency response; while in the MMSE method, the approximation is chosen such that the MSE of the transmitted data vector $\left(E\left[\parallel \mathbf{X}-\widehat{\mathbf{X}}{\parallel}^{2}\right]\right)$ is minimized, which introduces the noise variance in the equations.
8.1.1 Statement of the problem
where i_{ p }is an index vector denoting the pilot positions in the frequency spectrum, ${\stackrel{~}{\mathbf{H}}}_{{i}_{p}}$ is a vector containing the noisy value of the channel frequency spectrum in these pilot positions and ${\mathbf{F}}_{{i}_{p}}$ denotes the matrix obtained from taking the rows of the DFT matrix pertaining to the pilot positions. ${\mathit{\nu}}_{{i}_{p}}$ is the additive noise on the pilot points in the frequency domain. Thus, the channel estimation problem is equivalent to finding the sparse vector H from the above set of equations for a set of pilots. Various channel estimation methods [197] have been used with the usual tradeoffs of optimality and complexity. The least square (LS) [197], ML [198], MMSE [199–201], and Linear Minimum Mean Squared Error (LMMSE) [198, 199, 202] techniques are among some of these methods. However, none of these techniques use the inherent sparsity of the multipath channel H, and thus, they are not as accurate.
8.1.2 Sparse OFDM channel estimation
In the following, we present two methods that utilize this sparsity to enhance the channel estimation process.
CS-based channel estimation
- (1)
Decrease in the MSE: By applying the sparsity constraint, the energy of the estimated channel impulse response will be concentrated into a few coefficients while in the conventional methods, we usually observe a leakage of the energy to the neighboring coefficients of the nonzero taps. Thus, if the sparsity-based methods succeed in estimating the support of the channel impulse response, the MSE will be improved by prevention of the leakage effect.
- (2)
Reduction in the overhead: The number of pilot sub-carriers is in fact, the number of (noisy) samples that we obtain from the channel frequency response. Since the pilot sub-carriers do not convey any data, they are considered as the overhead imposed to enhance the estimation process. The theoretical results in [203] indicate that by means of sparsity-based methods, the perfect estimation can be achieved with an overhead proportional to the number of non-zero channel taps (which is considerably less than that of the current standards).
In the sequel, we present two iterative methods which exploit the inherent sparsity of the channel impulse response to improve the channel estimation task in OFDM systems.
8.1.3 Iterative method with adaptive thresholding (IMAT) for OFDM channel estimation[206]
8.1.4 Modified IMAT (MIMAT) for OFDM channel estimation[23]
In this method, the spectrum of the channel is initially estimated using a simple interpolation method such as linear interpolation between pilot sub-carriers. This initial estimate is further improved in a series of iterations between time (sparse) and frequency (information) domains to find the sparsest channel impulse response by using an adaptive thresholding scheme; in each iteration, after finding the locations of the taps (locations with previously estimated amplitudes higher than the threshold), their respective amplitudes are again found using the MMSE criterion. In each iteration, due to thresholding, some of the false taps that are noise samples with amplitudes above the threshold are discarded. Thus, the new iteration starts with a lower number of false taps. Moreover, because of the MMSE estimator, the valid taps approach their actual values in each new iteration. In the last iteration, the actual taps are detected and the MMSE estimator gives their respective values. This method is similar to RDE and IDE methods discussed in Sections 2.6 and 4.1.2. The main advantage of this method is its robustness against side-band zero-padding.^{o}
MIMAT algorithm for OFDM channel estimation
● Initialization: |
−Find an initial estimate of the time domain channel using linear interpolation: ${\u0125}^{\left(0\right)}={\u0125}_{\mathit{\text{linear}}}$ |
● Iterations: |
1. Set Threshold=β e^{ αi }. |
2. Using the threshold from the previous step, find the locations of the taps t by thresholding the time domain channel from the previous iteration (${\u0125}^{(i-1)}$). |
3. Solve for the values of the non-zero impulses using MMSE:${\u0125}_{t}=\mathit{\text{SNR}}\xb7{\stackrel{~}{\mathbf{F}}}^{H}{(\stackrel{~}{\mathbf{F}}\xb7\mathit{\text{SNR}}\xb7{\stackrel{~}{\mathbf{F}}}^{H}+\mathbf{I})}^{-1}$ |
4. Find the new estimate of the channel (${\u0125}^{\left(i\right)}$) by substituting the taps in their detected positions. |
5. Stop if the estimated channel is close enough to the previous estimation or when a maximum number of iterations is reached. |
The equation that has to be solved in (93) is usually over-determined which helps the suppression of the noise in each iteration step. Note that the solution presented in (94) represents a variant of the MMSE solution when the location of discrete impulses are known. If further statistical knowledge is available, this solution can be modified and a better estimation is obtained; however, this makes the approximation process more complex. This algorithm does not need many steps of iterations; the positions of the non-zero impulses are perfectly detected in three or four iterations for most types of channels.
8.2 Simulation results and discussions
9 Conclusion
A unified view of sparse signal processing has been presented in tutorial form. The sparsity in the key areas of sampling, coding, spectral estimation, array processing, component analysis, and channel estimation has been carefully exploited. Some form of uniform or random sampling has been shown to underpin the associated sparse processing methods used in each of these fields. The reconstruction methods used in each application domain have been introduced and the interconnections among them have been highlighted.
This development has revealed, for example, that the iterative methods developed for random sampling can be applied to real-field block and convolutional channel coding for impulsive noise (salt-and-pepper noise in the case of images) removal, SCA, and channel estimation for orthogonal frequency division multiplexing systems. These iterative reconstruction methods have been shown to be naturally extendable to spectral estimation and sparse array processing due to their similarity to channel coding in terms of mathematical models with significant improvements. Conversely, the minimum description length method developed for spectral estimation and array processing has potential for application in other areas. The error locator polynomial method developed for channel coding has, moreover, been shown to be a discrete version of the annihilating filter used in sampling with a finite rate of innovation and the Prony method in spectral estimation; the Pisarenko and MUSIC methods are further improvements of the Prony method when additive noise is also considered.
Linkages with emergent areas such as compressive sensing and channel estimation have also been considered. In addition, it has been suggested that the linear programming methods developed for compressive sensing and SCA can be applied to other applications with possible reduction of sampling rate. As such, this tutorial has provided a route for new applications of sparse signal processing to emerge, which can potentially reduce computational complexity and improve performance quality. Other potential applications of sparsity are in the areas of sensor networks and sparse array design.