Sampling a signal heavily depends on the prior information about the signal structure. For example, if one knows the signal of interest is band-limited, the Nyquist sampling rate is sufficient for exact recovery. Signals with most of their coefficients equal to zero are called sparse. It has been observed that sparsity is a powerful assumption that significantly reduces the required number of measurements. The process of recovering a sparse signal from a small number of measurements is called compressed sensing (CS). In CS, the measurement vector is assumed to be a linear combination of the ground-truth signal, i.e.,
$$\begin{aligned} \varvec{y} = \varvec{A}\varvec{x}, \end{aligned}$$
(1)
where \(\varvec{A}\in {\mathbb {R}}^{m\times N}\) is called the measurement matrix, and \(\varvec{x}\in {\mathbb {R}}^{N}\) is an unknown s-sparse signal, i.e., it has at most s nonzero entries or \(\Vert \varvec{x}\Vert _{0} \le s\). Here, \(\left\| \cdot \right\| _{0}\) is the \(\ell _{0}\) norm which counts the number of nonzero elements.Footnote 1 It has been shown that \({\mathcal {O}}(s\log (\tfrac{N}{s}))\) measurements are sufficient to guarantee exact recovery of the signal, by solving the convex program:
$$\begin{aligned} {\text{ P}}_{1}: \quad \min _{{\varvec{z}}\in {\mathbb {R}}^{N} } \quad \left\| {\varvec{z}}\right\| _{1} \quad {\text {s.t.}} \quad \varvec{y}= \varvec{A}\varvec{z}, \end{aligned}$$
(2)
with high probability (see [1, 2]).
Practical limitations force us to quantize the measurements in (1) as \(\varvec{y}={\mathcal {Q}}(\textit{{A x}})\) where \({\mathcal {Q}}:{\mathbb {R}}^m\rightarrow {\mathcal {A}}^m\) is a nonlinear operator that maps the measurements into a finite symbol alphabet \({\mathcal {A}}\). It is an interesting question to ask: What is the result of extreme quantization? [3] addressed this question: Signal reconstruction is still feasible using only one-bit quantized measurements. In one-bit compressed sensing, samples are taken as the sign of a linear transform of the signal \(\varvec{y}= {\text {sign}}\left( \varvec{A}\varvec{x}\right)\). This sampling scheme discards magnitude information. Therefore, we can only recover the direction of the signal. Fortunately, we can keep the amplitude information by using nonzero threshold. Thus, the new sampling scheme is \(\varvec{y}= [\text {sign}]\left( \varvec{A}\varvec{x}-\varvec{\tau }\right)\) where \(\varvec{\tau }\) is the threshold vector. In our work, each element of \(\varvec{\tau }\) is generated via \(\tau _i \sim {\mathcal {N}} (0,1)\). While a great part of CS literature discusses sparse signals, most natural signals are dictionary-sparse, i.e., sparse in a transform domain. For instance, sinusoidal signals and natural image are sparse in Fourier and wavelet domains, respectively [4,5,6,7]. This means that the signal of interest \(\varvec{f}\in {\mathbb {R}}^n\) can be expressed as \(\varvec{f}=\varvec{D x}\) where \(\varvec{D}\in {\mathbb {R}}^{n\times N}\) is a redundant dictionary with the constrain \(\varvec{DD}^{\mathrm{H}} = \varvec{I}\)Footnote 2 and \(\varvec{x}\in {\mathbb {R}}^N\) is a sparse vector. With this assumption, \(\varvec{y}=\varvec{Af}= \varvec{A D x}\) gives the measurement vector. A common approach for recovering such signals is to use the optimization problem
$$\begin{aligned} [\text{ P}]_{1,\varvec{D}}: \quad \min _{\varvec{z}\in {\mathbb {R}}^{N}} \quad \left\| \varvec{D}^{\mathrm{H}}\varvec{z}\right\| _{1} \quad {\text {s.t.}} \quad \varvec{y}= \varvec{A}\varvec{z}, \end{aligned}$$
(3)
which is called \(\ell _1\) analysis problem [5, 6].
In this work, we investigate a more practical situation where the signal of interest \(\varvec{f}\) is effective s-analysis-sparse which means that \(\varvec{f}\) satisfies \(\Vert \varvec{D}^{\mathrm{H}}\varvec{f}\Vert _1\le \sqrt{s}\Vert \varvec{D}^{\mathrm{H}}\varvec{f}\Vert _2\). In fact, perfect dictionary sparsity is rarely satisfied in practice, since real-world signals of interest are only compressible in a domain. Our approach is adaptive which means that we incorporate previous signal estimates into the current sampling procedure. More explicitly, we solve the optimization problem
$$\begin{aligned} \min _{\varvec{z}\in {\mathbb {R}}^{N}} ~ \left\| \varvec{D}^{\mathrm{H}}\varvec{z}\right\| _{1} ~ {\text {s.t.}} ~ \varvec{y}:=\mathrm{sign}(\varvec{A}\varvec{f}-\varvec{\varphi })=\mathrm{sign}(\varvec{A}\varvec{z}-\varvec{\varphi }), \end{aligned}$$
(4)
where \(\varvec{\varphi }\in {\mathbb {R}}^m\) is a vector of thresholds chosen adaptively based on previous estimations via Algorithm 1. We propose a strategy to find a best effective s-analysis-sparse approximation to a signal in \({\mathbb {R}}^n\).
1.1 Contributions
In this section, we state our novelties compared to the previous works. To highlight the contributions, we list them as below.
-
1
Proposing a novel algorithm for dictionary-sparse signals: We introduce an adaptive thresholding algorithm for reconstructing dictionary-sparse signals in case of binary measurements. The proposed algorithm provides accurate signal estimation even in case of redundant and coherent dictionaries. The required number of one-bit measurements considerably outperforms the non-adaptive approach used in [8].
-
2
Exponential decay of reconstruction error: The reconstruction error of our algorithm reduces exponentially when the number of adaptive stages (i.e., T) increases. To be more precise, we obtain a near-optimal relation between the reconstruction error and the required number of adaptive batches. Written in mathematical form, if one takes the output of our reconstruction algorithm by \(\varvec{f}_{T}\), then we show that \(\Vert \varvec{f}_{T}-\varvec{f}\Vert _2\approx {\mathcal {O}}(\tfrac{1}{2^T})\), where \(\varvec{f}\) is the ground-truth signal and T is the number of stages in our adaptive algorithm (see Theorem 1 for more details).
-
3
High-dimensional threshold selection: We propose an adaptive high-dimensional threshold to extract the most information from each sample, which substantially improves performance and reduces the reconstruction error (see Algorithm 1 for more explanations).
1.2 Prior works and key differences
In this section, we review prior works about applying quantized measurements to CS framework [3, 8,9,10,11,12,13]. In what follows, we explain some of them.
The authors of [3] propose a heuristic algorithm to reconstruct the ground-truth sparse signal from extreme quantized measurements, i.e., one-bit measurements. In [9], it has been shown that conventional CS algorithms also work well when the measurements are quantized. In [10], an algorithm with simple implementation is proposed. This algorithm has less error in Hamming distance than the existing ones. Investigated from a geometric view, the authors of [11] exploit functional analysis tools to provide an almost optimal solution to the problem of one-bit CS. They show that the number of required one-bit measurements is \({\mathcal {O}}(s\log ^2(\tfrac{n}{s}))\).
The work of [14] presents two algorithms for full (i.e., direction and norm) reconstruction with provable guarantees. The former approach takes advantage of the random thresholds, while the latter predicts the direction and magnitude separately. The authors of [12] introduce an adaptive thresholding scheme which utilizes a generalized approximate message passing algorithm (GAMP) [12] for recovery and thresholds update throughout sampling. In a different approach, the work [13] proposes an adaptive quantization and recovery scenario making an exponential error decay in one-bit CS frameworks. The authors of [15] use an adaptive process to take measurements around the estimated signal in each iteration. While [15] only changes the bias (mean) of the estimated signal, our algorithm also generates random dithers with adaptive variance. The variance value of thresholds initializes from an overestimation of desired signal and is divided by a factor of 2 in each iteration. This adaptive thresholding scheme forces the algorithm estimate to concentrate around the optimal solution and makes the feasible set smaller in each iteration (random part with reduced variance).
Many of the techniques mentioned for adaptive sparse signal recovery do not generalize (at least not in an obvious strategy) to dictionary-sparse signal. For example, determining a surrogate of \(\varvec{f}\) that is supposed to be of lower complexity with respect to \(\varvec{D}^{\mathrm{H}}\) is non-trivial and challenging. We should emphasize that while the proofs and main parts in [13] rely on hard thresholding operator, it could not be used for either effective or exact dictionary-sparse signals. This is due to that given a vector \(\varvec{x}\) in the analysis domainFootnote 3\({\mathbb {R}}^N\), one cannot guarantee the existence of a signal \(\varvec{f}\) in \({\mathbb {R}}^n\) such that \(\varvec{D}^{\mathrm{H}}\varvec{f}=\varvec{x}\). Recently, the work [8] shows both direction and magnitude of a dictionary-sparse signal can be recovered by a convex program with strong guarantees. The work [8] has inspired our work for recovering dictionary-sparse signal in an adaptive manner. In contrast to the existing method [8] for binary dictionary-sparse signal recovery which takes all of the measurements in one step with fixed settings, we solve the problem in an adaptive multistage way. In each stage, regarding the estimate from previous stage, our algorithm is propelled to the desired signal. In the non-adaptive work [8], the error rate is poorly large, while in our work, the error rate exponentially decays with increased number of adaptive steps.
Notation. Here, we introduce the notation used in the paper. Vectors and matrices are denoted by boldface lowercase and capital letters, respectively. \(\varvec{A}^T\) and \(\varvec{A}^{\mathrm{H}}\) stand for transposition and Hermitian of \(\varvec{A}\), respectively. C and c denote positive absolute constants which can be different from line to line. We use \(\left\| \varvec{v}\right\| _2= \sqrt{\sum _i |v_i|^2}\) for the \(\ell _2\)-norm of a vector \(\varvec{v}\) in \({\mathbb {R}}^n\), \(\left\| \varvec{v}\right\| _1= \sum _{i} |v_i|\) for the \(\ell _1\)-norm and \(\left\| \varvec{v}\right\| _{\infty }= \max _{i} |v_{i}|\) for the \(\ell _{\infty }\)-norm. We write \({\mathbb {S}}^{n-1}:= \{\varvec{v}\in {\mathbb {R}}^n : \left\| \varvec{v}\right\| _{2}= 1 \}\) for the unit Euclidean sphere in \({\mathbb {R}}^n\). For \(\varvec{x}\in {\mathbb {R}}^{n}\), we define \(\varvec{x}_S\) as the sub-vector in \({\mathbb {R}}^{|S|}\) consisting of the entries indexed by the set S.