One-bit compressive sampling via ℓ 0 minimization

Shen, Lixin; Suter, Bruce W.

doi:10.1186/s13634-016-0369-4

Research
Open access
Published: 14 June 2016

One-bit compressive sampling via ℓ ₀ minimization

Lixin Shen^1,2 &
Bruce W. Suter²

EURASIP Journal on Advances in Signal Processing volume 2016, Article number: 71 (2016) Cite this article

2603 Accesses
7 Citations
Metrics details

Abstract

The problem of 1-bit compressive sampling is addressed in this paper. We introduce an optimization model for reconstruction of sparse signals from 1-bit measurements. The model targets a solution that has the least ℓ ₀-norm among all signals satisfying consistency constraints stemming from the 1-bit measurements. An algorithm for solving the model is developed. Convergence analysis of the algorithm is presented. Our approach is to obtain a sequence of optimization problems by successively approximating the ℓ ₀-norm and to solve resulting problems by exploiting the proximity operator. We examine the performance of our proposed algorithm and compare it with the renormalized fixed point iteration (RFPI) (Boufounos and Baraniuk, 1-bit compressive sensing, 2008; Movahed et al., A robust RFPI-based 1-bit compressive sensing reconstruction algorithm, 2012), the generalized approximate message passing (GAMP) (Kamilov et al., IEEE Signal Process. Lett. 19(10):607–610, 2012), the linear programming (LP) (Plan and Vershynin, Commun. Pure Appl. Math. 66:1275–1297, 2013), and the binary iterative hard thresholding (BIHT) (Jacques et al., IEEE Trans. Inf. Theory 59:2082–2102, 2013) state-of-the-art algorithms for 1-bit compressive sampling reconstruction.

1 Introduction

Compressive sampling is a recent advance in signal acquisition [1, 2]. It provides a method to reconstruct a sparse signal $x \in \mathbb {R}^{n}$ from linear measurements

$$ y = \Phi x, $$

(1)

where Φ is a given m×n measurement matrix with m<n and $y \in \mathbb {R}^{m}$ is the measurement vector acquired. The objective of compressive sampling is to deliver an approximation to x from y and Φ. It has been demonstrated that the sparse signal x can be recovered exactly from y if Φ has Gaussian i.i.d. entries and satisfies the restricted isometry property [2]. Moreover, this sparse signal can be identified as a vector that has the smallest ℓ ₀-norm among all vectors yielding the same measurement vector y under the measurement matrix Φ.

However, the success of the reconstruction of this sparse signal is based on the assumption that the measurements have infinite bit precision. In realistic settings, the measurements are never exact and must be discretized prior to further signal analysis. In practice, these measurements are quantized, a mapping from a continuous real value to a discrete value over some finite range. As usual, quantization inevitably introduces errors in measurements. The problem of estimating a sparse signal from a set of quantized measurements has been addressed in recent literature. Surprisingly, it has been demonstrated theoretically and numerically that 1-bit per measurement is enough to retain information for sparse signal reconstruction. As pointed out in [3, 4], quantization to 1-bit measurements is appealing in practical applications. First, 1-bit quantizers are extremely inexpensive hardware devices that test values above or below zeros, enabling simple, efficient, and fast quantization. Second, 1-bit quantizers are robust to a number of non-linear distortions applied to measurements. Third, 1-bit quantizers do not suffer from dynamic range issues. Due to these attractive properties of 1-bit quantizers, in this paper, we will develop efficient algorithms for reconstruction of sparse signals from 1-bit measurements.

The 1-bit compressive sampling framework originally introduced in [3] is briefly described as follows. Formally, it can be written as

$$ y = A(x):=\text{sign}(\Phi x), $$

(2)

where the function sign(·) denotes the sign of the variable, element-wise, and zero values are assigned to be +1. Thus, the measurement operator A, called a 1-bit scalar quantizer, is a mapping from $\mathbb {R}^{n}$ to the Boolean cube {−1,1}^m. Note that the scale of the signal has been lost during the quantization process. We search for a sparse signal x ^⋆ in the unit ball of $\mathbb {R}^{m}$ such that the sparse signal x ^⋆ is consistent with our knowledge about the signal and measurement process, i.e., A(x ^⋆)=A(x).

The problem of reconstructing a sparse signal from its 1-bit measurements is generally non-convex, and therefore it is a challenge to develop an algorithm that can find a desired solution. Nevertheless, since this problem was introduced in [3] in 2008, there are several algorithms that have been developed for attacking it [3, 5–7]. Among those existing 1-bit compressive sampling algorithms, the binary iterative hard thresholding (BIHT) [4] exhibits its superior performance in both reconstruction error and as well as consistency via numerical simulations over the algorithms in [3, 5]. When there are a lot of sign flips in the measurements, a method based on adaptive outlier pursuit for 1-bit compressive sampling was proposed in [7–9]. By formulating 1-bit compressive sampling problem in Bayesian terms, a generalized approximate message passing (GAMP) [10] was developed to the problem of reconstruction from 1-bit measurements. The algorithms in [4, 7] require the sparsity of the desired signal to be given in advance. This requirement, however, is hardly satisfied in practice. By keeping only the sign of the measurements, the magnitude of the signal is lost. The models associated with the aforementioned algorithms seek sparse vectors x satisfying consistency constraints (2) in the unit sphere. As a result, these models are essentially non-convex and non-smooth. In [6], a convex minimization problem is formulated for reconstruction of sparse signals from 1-bit measurements and is solved by linear programming. The details of the above algorithms will be briefly reviewed in the next section.

In this paper, we introduce a new ℓ ₀ minimization model over a convex set determined by consistency constraints for 1-bit compressive sampling recovery and develop an algorithm for solving the proposed model. Our model does not require prior knowledge on the sparsity of the signal. Our approach for dealing with our proposed model is to obtain a sequence of optimization problems by successively approximating the ℓ ₀-norm and to solve resulting problems by exploiting the proximity operator [11]. Convergence analysis of our algorithm is presented.

This paper is organized as follows. In Section 2, we review and comment current 1-bit compressive sampling models and then introduce our own model by assimilating advantages of existing models. Heuristics for solving the proposed model are discussed in Section 3. Convergence analysis of the algorithm for the model is studied in Section 4. A numerical implementable algorithm for the model is presented in Section 5. The performance of our algorithm is demonstrated and compared with the BIHT, RFPI, LP, and GAMP in Section 6. We present our conclusion in Section 7.

2 Models for one-bit compressive sampling

In this section, we begin with reviewing existing models for reconstruction of sparse signals from 1-bit measurements. After analyzing these models, we propose our own model that assimilates the advantages of the existing ones.

Using matrix notation, the 1-bit measurements in (2) can be equivalently expressed as

$$ Y\Phi x \ge 0, $$

(3)

where Y:=diag(y) is an m×m diagonal matrix whose ith diagonal element is the ith entry of y. The expression Y Φ x≥0 in (3) means that all entries of the vector Y Φ x are no less than 0. Hence, we can treat the 1-bit measurements as sign constraints that should be enforced in the construction of the signal x of interest. In what follows, Eq. (3) is referred to as sign constraint or consistency condition, interchangeably.

The optimization model for reconstruction of a sparse signal from 1-bit measurements in [3] is

$$ \min \|x\|_{1} \quad \text{s.t.} \quad Y\Phi x \ge 0 \quad \text{and} \quad \|x\|_{2}=1, $$

(4)

where ∥·∥₁ and ∥·∥₂ denote the ℓ ₁-norm and the ℓ ₂-norm of a vector, respectively. In model (4), the ℓ ₁-norm objective function is used to favor sparse solutions, the sign constraint Y Φ x≥0 is used to impose the consistency between the 1-bit measurements and the solution, and the constraint ∥x∥₂=1 ensures a nontrivial solution lying on the unit ℓ ₂ sphere.

Instead of solving model (4) directly, a relaxed version of model (4)

$$ \min \left\{\lambda \|x\|_{1} + \sum\limits_{i=1}^{m} h((Y\Phi x)_{i})\right\} \quad \text{s.t.} \quad \|x\|_{2}=1 $$

(5)

was proposed in [3]. By employing a variation of the fixed point continuation algorithm in [12], an algorithm, which is called renormalized fixed point iteration (RFPI), was developed for solving model (5) efficiently. Here, λ is a regularization parameter and h is chosen to be the one-sided ℓ ₁ (or ℓ ₂) function, defined at $z \in \mathbb {R}$ as follows

$$ h(z): = \left\{ \begin{array}{ll} |z|~\left(\text{or}~ \frac{1}{2}z^{2}\right), & \text{if}~ z<0; \\ 0, & \text{otherwise.} \end{array} \right. $$

(6)

We remark that the one-sided ℓ ₂ function was adopted in [3] due to its convexity and smoothness properties that are required by a fixed point continuation algorithm.

In [5], a restricted-step-shrinkage algorithm was proposed for solving model (4). This algorithm is similar in sprit to trust-region methods for nonconvex optimization on the unit sphere and has a provable convergence guarantees.

Binary iterative hard thresholding (BIHT) algorithms were recently introduced for reconstruction of sparse signals from 1-bit measurements in [4]. The BIHT algorithms are developed for solving the following constrained optimization model

$$ \min \sum\limits_{i=1}^{m} h((Y\Phi x)_{i}) \quad \text{s.t.} \quad \|x\|_{0} \le s \quad \text{and} \quad \|x\|_{2}=1, $$

(7)

where h is defined by Eq. (6), s is a positive integer, and the ℓ ₀-norm ∥x∥₀ counts the number of non-zero entries in x. Minimizing the objective function of model (7) enforces the consistency condition (3). The BIHT algorithms for model (7) are a simple modification of the iterative thresholding algorithm proposed in [13]. It was shown numerically that the BIHT algorithms perform significantly better than the other aforementioned algorithms in [3, 5] in terms of both reconstruction error as well as consistency. Numerical experiments in [4] further show that the BIHT algorithm with h being the one-sided ℓ ₁ function performs better in low noise scenarios while the BIHT algorithm with h being the one-sided ℓ ₂ function performs better in high noise scenarios. For the measurements having noise (i.e., sign flips), a robust method for recovering signals from 1-bit measurements using adaptive outlier pursuit was proposed in [7], noise-adaptive renormalized fixed point iterative (NARFPI) was introduced in [8], and noise-adaptive restricted step shrinkage (NARSS) was developed in [9].

The algorithms reviewed above for 1-bit compressive sampling are developed for optimization problems having convex objective functions and non-convex constraints. In [6], a convex optimization program for reconstruction of sparse signals from 1-bit measurements was introduced as follows:

$$ \min \|x\|_{1} \quad \text{s.t.} \quad Y\Phi x \ge 0 \quad \text{and} \quad \|\Phi x\|_{1}=p, $$

(8)

where p is any fixed positive number. The first constraint Y Φ x≥0 requires that a solution to model (8) should be consistent with the 1-bit measurements. If a vector x satisfies the first constraint, so is ax for all 0<a<1. Hence, an algorithm for minimizing the ℓ ₁-norm by only requiring consistency with the measurements will yield the solution x being zero. The second constraint ∥Φ x∥₁=p is then used to prevent model (8) from returning a zero solution, thus resolves the amplitude ambiguity. By taking the first constraint into consideration, we know that ∥Φ x∥₁=〈y,Φ x〉; therefore, the second constraint becomes 〈Φ ^⊤ y,x〉=p. This confirms that both objective function and constraints of model (8) are convex. It was further pointed out in [6] that model (8) can be cast as a linear program. The corresponding algorithm is referred to as LP. As comparing model (8) with model (4), both the constraint ∥x∥₂=1 in model (4) and the constraint ∥Φ x∥₁=p in model (8), the only difference between both models, enforce a non-trivial solution. However, as we have already seen, model (8) with the constraint ∥Φ x∥₁=p can be solved by a computationally tractable algorithm.

Let us further comment on models (7) and (8). First, the sparsity constraint in model (7) is impractical since the sparsity of the underlying signal is unknown in general. Therefore, instead of imposing this sparse constraint, we consider to minimize an optimization model having the ℓ ₀-norm as its objective function. Second, although model (8) can be tackled by efficient linear programming solvers and the solution of model (8) preserves the effective sparsity of the underlying signal (see [6]), the solution is not necessarily sparse in general as shown in our numerical experiments (see Section 6). Motivated by the aforementioned models and the associated algorithms, we plan in this paper to reconstruct sparse signals from 1-bit measurements via solving the following constrained optimization model

$$ \min \|x\|_{0} \quad \text{s.t.} \quad Y\Phi x \ge 0 \quad \text{and} \quad \|\Phi x\|_{1}=p, $$

(9)

where p is again an arbitrary positive number. This model has the ℓ ₀-norm as its objective function and inequality Y Φ x≥0 and equality ∥Φ x∥₁=p as its convex constraints.

We remark that the actual value of p is not important as long as it is positive. More precisely, suppose that $\mathcal {S}$ and $\mathcal {S}^{\diamond }$ are two sets collecting all solutions of model (9) with p=1 and p=p ^◇>0, respectively. If $x \in \mathcal {S}$, that is, Y Φ x≥0 and ∥Φ x∥₁=1, then, by denoting x ^◇:=p ^◇ x, it can be verified that ∥x ^◇∥₀=∥x∥₀, Y Φ x ^◇≥0, and ∥Φ x ^◇∥₁=p ^◇. That indicates $x^{\diamond } \in \mathcal {S}^{\diamond }$. Therefore, we have that ${p^{\diamond }}\mathcal {S} \subset \mathcal {S}^{\diamond }$. Conversely, we can show that $\mathcal {S}^{\diamond } \subset {p^{\diamond }}\mathcal {S}$ by reverting above steps. Hence, ${p^{\diamond }}\mathcal {S} = \mathcal {S}^{\diamond }$. Without loss of generality, the positive number p is always assumed to be 1 in the rest part of the paper.

We compare model (7) and our proposed model (9) in the following result.

Proposition 1.

Let $y \in \mathbb {R}^{m}$ be the 1-bit measurements from an m×n measurement matrix Φ via Eq. (2) and let s be a positive integer. Assume that the vector $x \in \mathbb {R}^{n}$ is a solution to model (9). Then, model (7) has the unit vector $\frac {x}{\|x\|_{2}}$ as its solution if ∥x∥₀≤s; otherwise, model (7) can not have a solution satisfying the consistency constraint if ∥x∥₀>s.

Proof.

Since the vector x is a solution to model (9), then x satisfies the consistency constraint Y Φ x≥0. Hence, it, together with definition of h in (6), implies that

$$\sum\limits_{i=1}^{m} h\left(\left(Y\Phi \frac{x}{\|x\|_{2}} \right)_{i}\right)=0. $$

We further note that $\left \|\frac {x}{\|x\|_{2}}\right \|_{0} = \|x\|_{0}$ and $\left \|\frac {x}{\|x\|_{2}}\right \|_{2} =1$. Hence, the vector $\frac {x}{\|x\|_{2}}$ is a solution of model (7) if ∥x∥₀≤s.

On the other hand, if ∥x∥₀>s then all solutions to model (7) do not satisfy the consistency constraint. Suppose this statement is false. That is, there exists a solution of model (7), say x ^♯, such that Y Φ x ^♯≥0, ∥x ^♯∥₀≤s, and ∥x ^♯∥₂=1 hold. Set $x^{\diamond }: = \frac {x^{\sharp }}{\|\Phi x^{\sharp }\|_{1}}$. Then ∥x ^◇∥₀=∥x ^♯∥₀≤s, Y Φ x ^◇≥0, and ∥Φ x ^◇∥₁=1. Since ∥x ^◇∥₀<∥x∥₀, it turns out that x is not a solution of model (9). This contradicts our assumption on the vector x. This completes the proof of the result.

From Proposition 1, we can see that the sparsity s for model (7) is critical. If s is set too large, a solution to model (7) may not be the sparsest solution satisfying the consistency constraint; if s is set too small, solutions to model (7) cannot satisfy the consistency constraint. In contrast, our model (9) does not require the sparsity constraint used in model (7) and delivers the sparsest solution satisfying the consistency constraint. Therefore, these properties make our model more attractive for 1-bit compressive sampling than the BIHT.

To close this section, we recall an algorithm in [10] for the recovery of signals based on generalized approximate message passing (GAMP). This algorithm exploits the prior statistical information on the signal for estimating the minimum-mean-squared error solution from 1-bit measurements. The performance of GAMP will be included in our numerical section.

3 An algorithm for 1-bit compressive sampling

In this section, we will develop algorithms for the proposed model (9). We first reformulate model (9) as an unconstrained optimization problem via the indicator function of a closed convex set in $\mathbb {R}^{m+1}$. It turns out that the objective function of this unconstrained optimization problem is the sum of the ℓ ₀-norm and the indicator function composing with a matrix associated with the 1-bit measurements. Instead of directly solving the unconstrained optimization problem, we use some smooth concave functions to approximate the ℓ ₀-norm and then linearize the concave functions. The resulting model can be viewed as an optimization problem of minimizing a weighted ℓ ₁-norm over the closed convex set. The solution of this resulting model is served as a new point at which the concave functions will be linearized. This process is repeatedly performed until a certain stopping criterion is met. Several concrete examples for approximating the ℓ ₀-norm are provided at the end of this section.

We begin with introducing our notation and recalling some background from convex analysis. For the d-dimensional Euclidean space $\mathbb {R}^{d}$, the class of all lower semicontinuous convex functions $f: \mathbb {R}^{d} \rightarrow (-\infty, +\infty ]$ such that $\text {dom} f:=\{x \in \mathbb {R}^{d}: f(x) <+\infty \} \neq \emptyset $ is denoted by $\Gamma _{0}(\mathbb {R}^{d})$. The indicator function of a closed convex set C in $\mathbb {R}^{d}$ is defined, at $u \in \mathbb {R}^{d}$, as

$$\iota_{C}(u): =\left\{ \begin{array}{ll} 0, & \text{if}~ u\in C; \\ +\infty, & \text{otherwise.} \end{array} \right. $$

Clearly, ι _C is in $\Gamma _{0}(\mathbb {R}^{d})$ for any closed nonempty convex set C.

Next, we reformulate model (9) as an unconstrained optimization problem. To this end, from the m×n matrix Φ and the m-dimensional vector y in Eq. (2), we define an (m+1)×n matrix

$$ B :=\left[ \begin{array}{cc} \text{diag}(y) \\ y^{\top} \end{array} \right] \Phi $$

(10)

and a subset of $\mathbb {R}^{m+1}$

$$ \mathcal{C}:=\{z: z_{m+1}=1 \; \text{and}\; z_{i}\ge 0, \; i=1,2,\ldots, m\}, $$

(11)

respectively. Then, a vector x satisfies the two constraints of model (9) if and only if the vector Bx lies in the set $\mathcal {C}$. Hence, model (9) can be rewritten as

$$ \min\{\|x\|_{0} + \iota_{\mathcal{C}}(Bx): x \in \mathbb{R}^{n} \}. $$

(12)

Problem (12) is known to be NP-complete due to the non-convexity of the ℓ ₀-norm. Thus, there is a need for an algorithm that can pick the sparsest vector x satisfying the relation $Bx \in \mathcal {C}$. To attack this ℓ ₀-norm optimization problem, a common approach that appeared in recent literature is to approximate the ℓ ₀-norm by its computationally feasible approximations. In the context of compressed sensing, we review several popular choices for defining the ℓ ₀-norm as the limit of a sequence. More precisely, for a positive number ε∈(0,1), we consider separable concave functions of the form

$$ F_{\epsilon} (x) : = \sum\limits_{i=1}^{n} f_{\epsilon} (|x_{i}|), \quad x \in \mathbb{R}^{n}, $$

(13)

where $f_{\epsilon }: \mathbb {R}_{+} \rightarrow \mathbb {R}$ is strictly increasing, concave, and twice continuously differentiable such that

$$ {\lim}_{\epsilon \rightarrow 0+} F_{\epsilon} (x) = \|x\|_{0}, \quad \text{for all} \quad x \in \mathbb{R}^{n}. $$

(14)

The parameter ε plays a role of determining the quality of the approximation F _ε(x) to ∥x∥₀. Since the function f _ε is concave and smooth on $\mathbb {R}_{+}:=[0, \infty)$, it can be majorized by a simple function formed by its first-order Taylor series expansion at a arbitrary point. Write $\mathcal {F}_{\epsilon }(x, v):=F_{\epsilon } (v) + \langle \nabla F_{\epsilon } (|v|), |x|-|v| \rangle $. Therefore, at any point $v \in \mathbb {R}^{n}$, the following inequality holds

$$ F_{\epsilon} (x) < \mathcal{F}_{\epsilon}(x, v) $$

(15)

for all $x \in \mathbb {R}^{n}$ with |x|≠|v|. Here, for a vector u, we use |u| to denote a vector such that each element of |u| is the absolute value of the corresponding element of u. Clearly, when v is close enough to x, $\mathcal {F}_{\epsilon }(x, v)$ the expression on the right-hand side of (15) provides a reasonable approximation to the one on its left-hand side. Therefore, it is considered as a computationally feasible approximation to the ℓ ₀-norm of x. With such an approximation, a simplified problem is solved and its solution is used to formulate another simplified problem which is closer to the ideal problem (12). This process is then repeated until the solutions to the simplified problems become stationary or meet a termination criteria. This procedure is summarized in Algorithm 1.

The terms F _ε(|x ^(k)|) and 〈∇F _ε(|x ^(k)|),|x ^(k)|〉 that appear in the optimization problem in Algorithm 1 can be ignored because they are irrelevant to the optimization problem. Hence, the expression for x ^(k+1) in Algorithm 1 can be simplified as

$$ x^{(k+1)} \in \text{argmin}\left\{ \langle \nabla F_{\epsilon} (|x^{(k)}|), |x| \rangle + \iota_{\mathcal{C}}(Bx): x \in \mathbb{R}^{n} \right\}. $$

(16)

Since f _ε is strictly concave and increasing on $\mathbb {R}_{+}$, $f^{\prime }_{\epsilon }$ is positive on $\mathbb {R}_{+}$. Hence, $\langle \nabla F_{\epsilon } (|x^{(k)}|), |x| \rangle = \sum _{i=1}^{n} f'_{\epsilon }(|x^{(k)}_{i}|) |x_{i}|$ can be viewed as the weighted ℓ ₁-norm of x having $f^{\prime }_{\epsilon }(|x^{(k)}_{i}|)$ as its ith weight. Thus, the objective function of the above optimization problem is convex. Details for finding a solution to the problem will be presented in the next section.

In the rest of this section, we list two possible choices of the functions in (13), namely, the Mangasarian function in [14] and the Log-Det function in [15]. Many other choices can be found from [16–21] and the references therein.

The Mangasarian function is given as follows:

$$ F_{\epsilon} (x) = \sum\limits_{i=1}^{n} \left(1- e^{-|x_{i}|/\epsilon}\right), $$

(17)

where $x \in \mathbb {R}^{n}$. This function is used to approximate the ℓ ₀-norm to obtain minimum-support solutions (that is, solutions with as many components equal to zero as possible). The usefulness of the Mangasarian function was demonstrated in finding sparse solutions of underdetermined linear systems (see [22]).

The Log-Det function is defined as

$$ F_{\epsilon} (x) = \sum\limits_{i=1}^{n} \frac{\log(|x_{i}|/\epsilon+1)}{\log(1/\epsilon)}, $$

(18)

where $x \in \mathbb {R}^{n}$. Notice that ∥x∥₀ is equal to the rank of the diagonal matrix diag(x). The function F _ε(x) is equal to (log(1/ε))⁻¹ log(det(diag(x)+ε I))+n, the logarithm of the determinant of the matrix diag(x)+ε I. Hence, it was named as the Log-Det heuristic and used for minimizing the rank of a positive semidefinite matrix over a convex set in [15]. Constant terms can be ignored since they will not affect the solution of the optimization problem (16). Hence, the Log-Det function in (18) can be replaced by

$$ F_{\epsilon} (x) = \sum\limits_{i=1}^{n} \log(|x_{i}|+\epsilon). $$

(19)

We point it out that the Mangasarian function is bounded by 1; therefore, it is non-coercive while the Log-Det function is coercive. This makes a difference in convergence analysis of the associated Algorithm 1 that will be presented in the next section. In what follows, the function F _ε is the Mangasarian function or the Log-Det function. We specify it only when it is noted.

4 Convergence analysis

In this section, we shall give convergence analysis for Algorithm 1. We begin with presenting the following result.

Theorem 2.

Given ε∈(0,1), $x^{(0)}\in \mathbb {R}^{n}$, and the set $\mathcal {C}$ defined by (11), let the sequence $\{x^{(k)}: k\in \mathbb {N}\}$ be generated by Algorithm 1, where $\mathbb {N}$ is the set of all natural numbers. Then the following three statements hold:

The sequence $\{F_{\epsilon }(x^{(k)}): k\in \mathbb {N}\}$ converges when F _ε is corresponding to the Mangasarian function (17) or the Log-Det function (19);
The sequence $\{x^{(k)}: k\in \mathbb {N}\}$ is bounded when F _ε is the Log-Det function;
$\sum _{k=1}^{+\infty }\left \||x^{(k+1)}|-|x^{(k)}|\right \|_{2}^{2}$ is convergent when the sequence $\{x^{(k)}: k\in \mathbb {N}\}$ is bounded.

Proof.

We first prove item (i). The key step for proving it is to show that the sequence $\{F_{\epsilon }(x^{(k)}): k\in \mathbb {N}\}$ is decreasing and bounded below. The boundedness of the sequence is due to the fact that F _ε(0)≤F _ε(x ^(k)). From Step 1 of Algorithm 1 or Eq. (16), one can immediately have that

$$\iota_{\mathcal{C}}(Bx^{(k+1)})=0 $$

and

$$ \langle \nabla F_{\epsilon} (|x^{(k)}|), |x^{(k+1)}| \rangle \le \langle \nabla F_{\epsilon} (|x^{(k)}|), |x^{(k)}| \rangle. $$

(20)

By identifying x ^(k) and x ^(k+1), respectively, as v and x in (15) and using the inequality in (20), we get F _ε(x ^(k+1))≤F _ε(x ^(k)). Hence, the sequence $\{F_{\epsilon }(x^{(k)}): k\in \mathbb {N}\}$ is decreasing and bounded below. Item (i) follows immediately.

When F _ε is chosen as the Log-Det function, the coerciveness of F _ε together with item (i) implies that the sequence $\{x^{(k)}: k\in \mathbb {N}\}$ must be bounded, that is, item (ii) holds.

Finally, we prove item (iii). Denote w ^(k):=|x ^(k+1)|−|x ^(k)|. From the second-order Taylor expansion of the function F _ε at x ^(k), we have that

$$ F_{\epsilon}(x^{(k+1)})=\mathcal{F}_{\epsilon}(x^{(k+1)}, x^{(k)})+\frac{1}{2} (w^{(k)})^{\top} \nabla^{2} F_{\epsilon}(v) w^{(k)}, $$

(21)

where v is some point in the line segment linking the points |x ^(k+1)| and |x ^(k)| and ∇² F _ε(v) is the Hessian matrix of F _ε at the point v.

By (20), the first term on the right hand of Eq. (21) is less than F _ε(x ^(k)). By Eq. (19), ∇² F _ε(v) for v lying in the first octant of $\mathbb {R}^{n}$ is a diagonal matrix and is equal to $-\frac {1}{\epsilon ^{2}}\text {diag}(e^{-\frac {v_{1}}{\epsilon }}, e^{-\frac {v_{2}}{\epsilon }}, \ldots, e^{-\frac {v_{n}}{\epsilon }})$ or −diag((v ₁+ε)⁻²,(v ₂+ε)⁻²,…(v _n+ε)⁻²) which corresponds to F _ε being the Mangasarian or the Log-Det function. Hence, the matrix ∇² F _ε(v) is negative definite. Since the sequence $\{x^{(k)}: k\in \mathbb {N}\}$ is bounded, there exists a constant ρ>0 such that

$$(w^{(k)})^{\top} \nabla^{2} F_{\epsilon}(v) w^{(k)} \le -\rho \|w^{(k)}\|_{2}^{2}. $$

Putting all above results together into (21), we have that

$$F_{\epsilon}(x^{(k+1)})\le F_{\epsilon}(x^{(k)}) -\frac{\rho}{2} \left\||x^{(k+1)}|-|x^{(k)}|\right\|_{2}^{2}. $$

Summing the above inequality from k=1 to +∞ and using item (i) we get the proof of item (iii).

From item (iii) of Theorem 2, we have ∥|x ^(k+1)|−|x ^(k)|∥₂→0 as k→∞.

To further study properties of the sequence $\{x^{(k)}: k\in \mathbb {N}\}$ generated by Algorithm 1, the matrix B ^⊤ is required to have the range space property (RSP) which is originally introduced in [23]. With this property and motivated by the work in [23], we prove that Algorithm 1 can yield a sparse solution for model (12).

Prior to presenting the definition of the RSP, we introduce the notation to be used throughout the rest of this paper. Given a set S⊂{1,2,…,n}, the symbol |S| denotes the cardinality of S, and S ^c:={1,2,…,n}∖S is the complement of S. Recall that for a vector u, by abuse of notation, we also use |u| to denote the vector whose elements are the absolute values of the corresponding elements of u. For a given matrix A having n columns, a vector u in $\mathbb {R}^{n}$, and a set S⊂{1,2,…,n}, we use the notation A _S to denote the submatrix extracted from A with column indices in S and u _S the subvector extracted from u with component indices in S.

Definition 3 (Range space property (RSP)).

Let A be an m×n matrix. Its transpose A ^⊤ is said to satisfy the range space property (RSP) of order K with a constant ρ>0 if for all sets S⊆{1,…,n} with |S|≥K and for all ξ in the range space of A ^⊤ the following inequality holds

$$\|\xi_{S^{c}}\|_{1}\le \rho\|\xi_{S}\|_{1}. $$

The range space property states that the range of the matrix A ^⊤ contains no vectors where some entries have a significantly larger magnitude with respect to the others. We remark that if the transpose of an m×n matrix A has the RSP of order K with a constant ρ>0, then for every non-empty set S⊆{1,…,n}, the transpose of the matrix A _S, denoted by $A_{S}^{\top }$, has the RSP of order K with constant ρ as well. We further remark that there is a relationship (see Proposition 3.6 in [23]) between the RSP and the restricted isometry property (RIP) and null space property (NSP) of A which have been widely used in the compressive sensing literature. For example, if we have a matrix satisfying the NSP or RIP, we may construct a matrix satisfying the RSP. Unfortunately, similar to the RIP and the NSP, the RSP is hard to verify in practice.

The next result shows that if the transpose of the matrix B in Algorithm 1 possesses the RSP, then Algorithm 1 can lead to a sparse solution for model (12). To this end, we define a mapping $\sigma : \mathbb {R}^{d} \rightarrow \mathbb {R}^{d}$ such that the ith component of the vector σ(u) is the ith largest component of |u|.

Proposition 4.

Let B be the (m+1)×n matrix be defined by (10) and let $\{x^{(k)}: k \in \mathbb {N}\}$ be the sequence generated by Algorithm 1. Assume that the matrix B ^⊤ has the RSP of order K with ρ>0 satisfying (1+ρ)K<n. Suppose that the sequence $\{x^{(k)}: k \in \mathbb {N}\}$ is bounded. Then, (σ(x ^(k)))_n the nth largest component of x ^(k) converges to 0.

Proof.

Suppose this proposition is false. Then there exist a constant γ>0 and a subsequence $\{x^{(k_{j})}:j\in \mathbb {N}\}$ such that $\phantom {\dot {i}\!}(\sigma (x^{(k_{j})}))_{n}\geq 2\gamma >0$ for all $j \in \mathbb {N}$. From item (iii) of Theorem, 2 we have that

$$ (\sigma(x^{(k_{j}+1)}))_{n}\geq \gamma $$

(22)

for all sufficiently large j. For simplicity, we set $\phantom {\dot {i}\!}y^{(k_{j})}:=\nabla F_{\epsilon }(|x^{(k_{j})}|)$. Hence, by inequality (22) and F _ε, we know that

$$ |x^{(k_{j})}| >0 \quad |x^{(k_{j}+1)}|>0, \quad \text{and} \quad y^{(k_{j})}>0 $$

(23)

for all sufficient large j. In what follows, we assume that the integer j is large enough such that the above inequalities in (23) hold.

Since the vector $\phantom {\dot {i}\!}x^{(k_{j}+1)}$ is obtained through step 1 of Algorithm 1, i.e., Eq. (16), then by Fermat’s rule and the chain rule of subdifferential, we have that

$$0=\text{diag}(y^{(k_{j})})\partial {\|\cdot\|_{1}}(\text{diag}(y^{(k_{j})})x^{(k_{j}+1)})+B^{\top} b^{(k_{j}+1)}, $$

where $\phantom {\dot {i}\!}b^{(k_{j}+1)}\in \partial \iota _{C}(B x^{(k_{j}+1)})$. By (23), we get

$$\partial {\|\cdot\|_{1}}(\text{diag}(y^{(k_{j})})x^{(k_{j}+1)})=\{ \text{sgn}(x^{(k_{j}+1)})\}, $$

where sgn(·) denotes the sign of the variable element-wise. Thus

$$y^{(k_{j})}=|\xi^{(k_{j}+1)}|, $$

where $\phantom {\dot {i}\!}\xi ^{(k_{j}+1)}=B^{\top } b^{(k_{j}+1)}$ is in the range of B ^⊤.

Let S be the set of indices corresponding to the K smallest components of $|\xi ^{(k_{j}+1)}|\phantom {\dot {i}\!}$. Hence,

$$\sum\limits_{i=1}^{n-K}(\sigma(y^{(k_{j})}))_{i} = \|\xi^{(k_{j}+1)}_{S^{c}}\|_{1} $$

and

$$\sum\limits_{i=n-K+1}^{n}(\sigma(y^{(k_{j})}))_{i} = \|\xi^{(k_{j}+1)}_{S}\|_{1}. $$

Since B ^⊤ has the RSP of order K with the constant ρ, we have that $\phantom {\dot {i}\!}\|\xi ^{(k_{j}+1)}_{S^{c}}\|_{1}\le \rho \|\xi ^{(k_{j}+1)}_{S}\|_{1}$. Therefore,

$$ \sum\limits_{i=1}^{n-K}(\sigma(y^{(k_{j})}))_{i} \le \rho \sum\limits_{i=n-K+1}^{n}(\sigma(y^{(k_{j})}))_{i}. $$

(24)

However, by the definition of σ, we have that

$$\sum\limits_{i=1}^{n-K}(\sigma(y^{(k_{j})}))_{i} \ge (n-K)(\sigma(y^{(k_{j})}))_{n-K+1} $$

and

$$\sum\limits_{i=n-K+1}^{n}(\sigma(y^{(k_{j})}))_{i} \le K (\sigma(y^{(k_{j})}))_{n-K+1}. $$

These inequalities together with the condition (1+ρ)K<n lead to

$$\sum\limits_{i=1}^{n-K}(\sigma(y^{(k_{j})}))_{i} > \rho \sum\limits_{i=n-K+1}^{n}(\sigma(y^{(k_{j})}))_{i}, $$

which contradicts to (24). This completes the proof of the proposition.

From Proposition 4, we conclude that a sparse solution is guaranteed via Algorithm 1 if the transpose of B satisfies the RSP. Next, we answer how sparse this solution will be. To this end, we introduce some notation and develop a technical lemma. For a vector $x \in \mathbb {R}^{d}$, we denote by τ(x) the set of the indices of non-zero elements of x, i.e., τ(x):={i:x _i≠0}. For a sequence $\{x^{(k)}: k \in \mathbb {N}\}$, a positive number μ, and an integer k, we define $I_{\mu }(x^{(k)}):=\{i: |x_{i}^{(k)}|\ge \mu \}$.

Lemma 5.

Let B be the (m+1)×n matrix defined by (10), let F _ε be the Log-Det function defined by (19), and let $\{x^{(k)}: k \in \mathbb {N}\}$ be the sequence generated by Algorithm 1. Assume that the matrix B ^⊤ has the RSP of order K with ρ>0 satisfying (1+ρ)K<n. If there exist μ>ρ ε n such that |I _μ(x ^(k))|≥K for all sufficient large k, then there exists a $ k^{\prime \prime } \in \mathbb {N}$ such that ∥x ^(k)∥₀<n and $\phantom {\dot {i}\!}\tau (x^{(k+1)})\subseteq \tau (x^{(k^{\prime \prime })})$ for all k>k ^′′.

Proof.

Set y ^(k):=∇F _ε(|x ^(k)|). Since x ^(k+1) is a solution to the optimization problem (16), then by Fermat’s rule and the chain rule of subdifferential we have that

$$0\in\text{diag}(y^{(k)})\partial {\|\cdot\|_{1}}(\text{diag}(y^{(k)})x^{(k+1)})+B^{\top} b^{(k+1)}, $$

where b ^(k+1)∈∂ ι _C(B x ^(k+1)). Hence, if $x_{i}^{(k+1)}\neq 0$, we have that $y^{(k)}_{i} = |(B^{\top } b^{(k+1)})_{i}|$.

For i∈I _μ(x ^(k)), we have that $|x_{i}^{(k)}| \ge \mu $ and $y^{(k)}_{i}=f'_{\epsilon }(|x^{(k)}_{i}|) \le f'_{\epsilon }(\mu)$ for all $k\in \mathbb {N}$, where f _ε= log(·+ε). Furthermore, there exist a k ^′ such that $|x_{i}^{k+1}|>0$ for i∈I _μ(x ^(k)) and k≥k ^′ due to item (iii) in Theorem 2. Thus, we have for all k≥k ^′

$$\begin{array}{@{}rcl@{}} \sum\limits_{i\in I_{\mu}(x^{(k)})}{|(B^{\top} b^{(k+1)})_{i}|} &=& \sum\limits_{i\in I_{\mu}(x^{(k)})} y^{(k)}_{i} \\ &\le& \sum\limits_{i\in I_{\mu}(x^{(k)})} f'_{\epsilon}(\mu)\le W^{*}, \end{array} $$

where $W^{*} = n {\lim }_{\epsilon \rightarrow 0+} f_{\epsilon }'(\mu)=\frac {n}{\mu }$ is a positive number dependent on μ.

Now, we are ready to prove ∥x ^(k)∥₀<n for all k>k ^′′. By Proposition 4, we have that (σ(x ^(k)))_n→0 when k→+∞. Therefore, there exists an integer k ^′′>k ^′ such that |I _μ(x ^(k))|≥K and $0\le \sigma (x^{(k)}))_{n}<\min \{\frac {\mu }{\rho n}-\epsilon, \mu \}$ for all k≥k ^′′. Let i ₀ be the index such that $|x_{i_{0}}^{(k^{\prime \prime })}|=(\sigma (x^{(k^{\prime \prime })}))_{n}$. We will show that $x_{i_{0}}^{(k^{\prime \prime }+1)}=0$. If this statement is not true, that is, $x_{i_{0}}^{(k^{\prime \prime }+1)}$ is not zero, then

$$ |(B^{\top} b^{(k^{\prime\prime}+1)})_{i_{0}}|=f'_{\epsilon}(|x^{(k^{\prime\prime})}_{i_{0}}|)=\frac{1}{|x^{(k^{\prime\prime})}_{i_{0}}|+\epsilon}>\rho W^{*}. $$

(25)

However, since i ₀ is not in the set $I_{\mu }(x^{(k^{\prime \prime })})\phantom {\dot {i}\!}$ and B ^⊤ satisfies the RSP, we have that

$$\begin{array}{@{}rcl@{}} |(B^{\top} b^{(k^{\prime\prime}+1)})_{i_{0}}|&\le& \sum\limits_{i\notin I_{\mu}(x^{(k^{\prime\prime})})}|(B^{\top} b^{(k^{\prime\prime}+1)})_{i}| \\ &\leq& \rho \sum\limits_{i\in I_{\mu}(x^{(k^{\prime\prime})})}|(B^{\top} b^{(k^{\prime\prime}+1)})_{i}| \le \rho W^{*}, \end{array} $$

which contradicts to (25). Hence, we have that $x_{i_{0}}^{(k^{\prime \prime }+1)}=0$ and $\phantom {\dot {i}\!}|\tau (x^{(k^{\prime \prime }+1)})|<n$. By replacing k ^′′ by k ^′′+1 and repeating this process, we can obtain $x_{i_{0}}^{(k^{\prime \prime }+\ell)}=0$ for all $\ell \in \mathbb {N}$. Therefore, ∥x∥₀<n for all k>k ^′′. This process can be also applied to other components satisfying $x_{i}^{(k^{\prime \prime }+1)}=0$. Thus there exists a $k^{\prime \prime } \in \mathbb {N}$ such that $\phantom {\dot {i}\!}\tau (x^{(k)})\subseteq \tau (x^{(k^{\prime \prime })})$ for all k≥k ^′′.

With Lemma 5, the next result shows that when the transpose of B satisfies the RSP, there exists a cluster point of the sequence generated by Algorithm 1 that is sparse and satisfies the consistency condition.

Theorem 6.

Let B be the (m+1)×n matrix defined by (10), let F _ε be the Log-Det function defined by (19), and let $\{x^{(k)}: k \in \mathbb {N}\}$ be the sequence generated by Algorithm 1. Assume that the matrix B ^⊤ has the RSP of order K with ρ>0 satisfying (1+ρ)K<n. Then, there is a subsequence $\{x^{(k_{j})}: j\in \mathbb {N}\}$ that converges to a ⌊(1+ρ)K⌋-sparse solution, that is $(\sigma (x^{(k_{j})}))_{\lfloor (1+\rho)K+1\rfloor }\rightarrow 0$ as j→+∞ and ε→0.

Proof.

Suppose the theorem is false. Then, there exist μ ^∗, for any $0<\epsilon ^{*}<\frac {\mu ^{*}}{\rho n}$, there exist a ε∈(0,ε ^∗) and k ^′ such that (σ(x ^(k)))_{⌊(1+ρ)K+1⌋}≥μ ^∗ for all k≥k ^′. It implies that for all k≥k ^′

$$ |I_{\mu^{*}}(x^{(k)})|\ge \lfloor (1+\rho)K+1\rfloor>(1+\rho)K> K. $$

(26)

By Lemma 5, there exist a k ^′′≥k ^′ such that ∥x ^(k)∥₀<n and $\phantom {\dot {i}\!}\tau (x^{(k+1)})\subseteq \tau (x^{(k^{\prime \prime })})$ for all k≥k ^′′. Let $\phantom {\dot {i}\!}S=\tau (x^{(k^{\prime \prime })})$. Thus $x^{(k)}_{S^{c}}=0$ for all k≥k ^′′. Therefore, the optimization problem (16) for updating x ^(k+1) can be reduced to the following one

$$ x_{S}^{k+1} \in \text{arg}\min\{\langle (\nabla F_{\epsilon}(|x^{(k)}|))_{S}, u \rangle +\iota((B_{S}) u): u\in\mathbb{R}^{|S|}\}. $$

(27)

If $|\tau (x^{(k^{\prime \prime })})|>|I_{\mu ^{*}}(x^{(k^{\prime \prime })})|$, from (26), we have (1+ρ)K<|S|. Thus, from Lemma 5 and $B^{\top }_{S}$ having RSP with the same parameters, there exist a k ^′′′>k ^′′ such that $\phantom {\dot {i}\!}\tau (x^{(k)})<\tau (x^{(k^{\prime \prime })})$ for all k≥k ^′′′. Therefore, by induction, there must exist a $\tilde k$ such that for all $k\ge \tilde {k}$

$$ \tau (x^{(k)})=I_{\mu^{*}}(x^{(k)}), \; \tau (x^{k})\subseteq \tau (x^{(\tilde{k})}). $$

It means that for all $k\ge \tilde {k}$, all the nonzero components of x ^(k) are bounded below by μ ^∗. Therefore, for any $k\ge \tilde {k}$, the updating Eq. (16) is reduced by (27) with $S=I_{\mu ^{*}}(x^{(k)})\phantom {\dot {i}\!}$. From Lemma 4, we get [σ(x ^(k))]_|S|→0 which contradicts with $|x_{|S|}^{k}|\ge \mu ^{*}$. Therefore, we get this theorem.

5 An implementation of Algorithm 1

In this section, we describe in detail an implementation of Algorithm 1 and show how to select the parameters of the associated algorithm.

Solving problem (16) is the main issue for Algorithm 1. A general model related to (16) is

$$ \min\{\|\Gamma x\|_{1} + \varphi (Bx): x \in \mathbb{R}^{n}\}, $$

(28)

where Γ is a diagonal matrix with positive diagonal elements and φ is in $\Gamma _{0}(\mathbb {R}^{m+1})$. In particular, if we choose Γ=∇F _ε(|x ^(k)|) and $\varphi = \iota _{\mathcal {C}}$, where x ^(k) is a vector in $\mathbb {R}^{n}$, ε is a positive number, $\mathcal {C}$ is given by (11), and F _ε is a function given by (13), then model (28) reduces to the optimization problem in Algorithm 1.

We solve model (28) by using recently developed first-order primal-dual algorithm (see, e.g., [24–26]). To present this algorithm, we need two concepts in convex analysis, namely, the proximity operator and conjugate function. The proximity operator was introduced in [27]. For a function $f \in \Gamma _{0}(\mathbb {R}^{d})$, the proximity operator of f with parameter λ, denoted by prox_{λ
f}, is a mapping from $\mathbb {R}^{d}$ to itself, defined for a given point $x \in \mathbb {R}^{d}$ by

$$ \text{prox}_{\lambda f} (x):= \text{argmin} \left\{\frac{1}{2\lambda} \|u-x\|^{2}_{2} + f(u): u \in \mathbb{R}^{d} \right\}. $$

The conjugate of $f\in \Gamma _{0}(\mathbb {R}^{d})$ is the function $f^{*} \in \Gamma _{0}(\mathbb {R}^{d})$ defined at $z \in \mathbb {R}^{d}$ by

$$f^{*}(z):= \sup\{\langle x, z \rangle-f(x): x\in \mathbb{R}^{d}\}. $$

With these notation, the first-order primal-dual (PD) method for solving (28) is summarized in Algorithm 2 (referred to as PD subroutine).

Theorem 7.

Let B be an (m+1)×n matrix defined by (10), let $\mathcal {C}$be the set given by (11), let αand β be two positive numbers, and let L be a positive such that L≥∥B∥², where ∥B∥ is the largest singular value of B. If

$$\alpha \beta L < 1, $$

then for any arbitrary initial vector $(x^{-1}, x^{0}, u^{0}) \in \mathbb {R}^{n} \times \mathbb {R}^{n} \times \mathbb {R}^{m+1}$, the sequence $\{x^{k}: k \in \mathbb {N}\}$ generated by Algorithm 2 converges to a solution of model (28).

The proof of Theorem 7 follows immediately from Theorem 1 in [24] or Theorem 3.5 in [25]. We skip its proof here.

Both proximity operators $\text {prox}_{\alpha \|\cdot \|_{1} \circ \Gamma }$ and $\text {prox}_{\beta \varphi ^{*}}\phantom {\dot {i}\!}$ should be computed easily and efficiently in order to make the iterative scheme in Algorithm 2 numerically efficient. Indeed, the proximity operator $\text {prox}_{\alpha \|\cdot \|_{1} \circ \Gamma }$ is given at $z \in \mathbb {R}^{n}$ as follows: for j=1,2,…,n

$$ \left(\text{prox}_{\alpha\|\cdot\|_{1} \circ \Gamma}(z)\right)_{j}=\max\left\{|z_{j}|-\alpha \gamma_{j},0\right\} \cdot \text{sign}(z_{j}), $$

(29)

where γ _j is the jth diagonal element of Γ. Using the well-known Moreau decomposition (see, e.g., [27, 28])

$$ \text{prox}_{\beta \varphi^{*}} = I- \beta \; \text{prox}_{\frac{1}{\beta} \varphi} \circ \left(\frac{1}{\beta} I\right), $$

(30)

we can compute the proximity operator $\phantom {\dot {i}\!}\text {prox}_{\beta \varphi ^{*}}$ via $\text {prox}_{\frac {1}{\beta } \varphi }$ which depends on a particular form of the function φ. As our purpose is to develop algorithms for the optimization problem in Algorithm 1, we need to compute the proximity operator of $\phantom {\dot {i}\!}\iota ^{*}_{\mathcal {C}}$ which is given in the following.

Lemma 8.

If $\mathcal {C}$is the set given by (11) and βis a positive number, then for $z \in \mathbb {R}^{m+1}$, we have that

$$ \text{prox}_{\beta \iota^{*}_{\mathcal{C}}}(z)=(z_{1}-(z_{1})_{+}, \ldots, z_{m}-(z_{m})_{+}, z_{m+1}-\beta), $$

(31)

where (s)₊ is s if s≥0 and 0 otherwise.

Proof.

We first give an explicit form for the proximity operator $\text {prox}_{\frac {1}{\beta }\iota _{\mathcal {C}}}$. Note that $\iota _{\mathcal {C}} = \frac {1}{\beta }\iota _{\mathcal {C}}$ for β>0 and $\iota _{\mathcal {C}}(z) = \iota _{\{1\}}(z_{m+1}) + \sum _{i=1}^{m} \iota _{[0,\infty)}(z_{i})$, for $z \in \mathbb {R}^{m+1}$. Hence, we have that

$$ \text{prox}_{\frac{1}{\beta}\iota_{\mathcal{C}}}(z)=((z_{1})_{+}, (z_{2})_{+}, \ldots, (z_{m})_{+}, 1), $$

(32)

where (s)₊ is s if s≥0 and 0 otherwise. Here we use the facts that $\text {prox}_{\iota _{[0,+\infty)}}(s)=(s)_{+}$ and $\text {prox}_{\iota _{\{1\}}}(s)=1$ for any $s \in \mathbb {R}$.

By the Moreau decomposition (30), we have that $\text {prox}_{\beta \iota ^{*}_{\mathcal {C}}}(z)=z-\beta \text {prox}_{\frac {1}{\beta }\iota _{\mathcal {C}}}(\frac {1}{\beta }z)$. This together with Eq. (32) yields (31).

Next, we comment on the diagonal matrix Γ in model (28). When the function φ in model (28) is chosen to be ι _C, then the relation a φ=φ holds for any positive number a. Hence, by rescaling the diagonal matrix Γ in model (28) with any positive number, the solutions of model (28) are not altered. Therefore, we can assume that the largest diagonal entry of Γ is always equal to one.

In applications of Theorem 7 as in Algorithm 2, we should make the product of α and β as close to 1/∥B∥² as possible. In our numerical simulations, we always set

$$ \alpha = \frac{0.999}{\beta \|B\|^{2}}. $$

(33)

In such a way, β is essentially the only parameter that needs to be determined.

Prior to computing α for a given β by Eq. (33), we need to know the norm of the matrix B. When min{m,n} is small, the norm of the matrix B can be computed directly. When min{m,n} is large, an upper bound of the norm of the matrix B is estimated in terms of the size of B as follows.

Proposition 9.

Let Φ be an m×n matrix with i.i.d. standard Gaussian entries and y be an m-dimensional vector with its component being +1 or −1. We define an (m+1)×n matrix B from Φ and y via Eq. (10). Then,

$$\mathbb{E} \{\|B\|\} \le \sqrt{m+1}(\sqrt{n}+\sqrt{m}). $$

Moreover,

$$\|B\| \le \sqrt{m+1}(\sqrt{n}+\sqrt{m}+t) $$

holds with probability at least $1-2 e^{-t^{2}/2}$ for all t≥0.

Proof.

By the structure of the matrix B in (10), we know that

$$\|B\| \le \left\|\left[ \begin{array}{cc} \text{diag}(y) \\ y^{\top} \end{array} \right] \right\| \cdot \|\Phi\|. $$

Therefore, we just need to compute the norms on the right-hand side of the above inequality. Denote by I _m the m×m identity matrix and 1_m the vector with all its components being 1. Then,

$$\left[ \begin{array}{cc} \text{diag}(y) \\ y^{\top} \end{array}\right] \left[\begin{array}{ll} \text{diag}(y) & y \end{array}\right] =\left[ \begin{array}{ll} I_{m} & {1}_{m} \\ {1}_{m}^{\top} & m \end{array} \right], $$

which is a special arrow-head matrix and has m+1 as its largest eigenvalue (see [29]). Hence,

$$\left\|\left[\begin{array}{cc} \text{diag}(y) \\ y^{\top} \end{array} \right]\right\| =\sqrt{m+1}. $$

Furthermore, by using random matrix theory for the matrix Φ, we know that $\mathbb {E} \{\|\Phi \|\} \le \sqrt {n}+\sqrt {m}$ and $\|\Phi \| \le \sqrt {n}+\sqrt {m}+t$ with probability at least $1-2 e^{-t^{2}/2}$ for all t≥0 (see, e.g., [30]). This completes the proof of this proposition.

Let us compute the norm of B numerically for 100 randomly generated matrices Φ and vectors y for the pair (m,n) with three different choices (500,1000), (1000,1000), and (1500,1000), respectively. Corresponding to these choices, the mean values of ∥B∥ are about 815, 1276, and 1711 while the upper bounds of the expected values of ∥B∥ by Proposition 9 are about 1208, 2001, and 2726, respectively. We can see that the norm of B varies with its size and turns to be a big number when the value of min{m,n} is relatively large. As a consequence, the parameter α or β must be very small relative to the other by Eq. (33). Therefore, in what follows, the used matrix B in model (28) is considered to have been rescaled in the following way:

$$ \frac{B}{\|B\|} \quad \text{or} \quad \frac{B}{\sqrt{m+1}(\sqrt{n}+\sqrt{m})} $$

(34)

when the norm of B can be computed easily or not.

The complete procedure for model (12) and how the PD subroutine is employed are summarized in Algorithm 3.

6 Numerical simulations

In this section, we demonstrate the performance of Algorithm 3 for 1-bit compressive sampling reconstruction in terms of accuracy and consistency and compare it with the BIHT, RFPI, LP, and GAMP.

Through this section, all random m×n matrices Φ and length-n, s-sparse vectors x are generated based on the following assumption: entries of Φ and x on their support are i.i.d. Gaussian random variables with zero mean and unit variances. The locations of the nonzero entries (i.e., the support) of x are randomly permuted. We then generate the 1-bit observation vector y by Eq. (2). We obtain reconstruction of x ^⋆ from y by using either the BIHT, RFPI, LP, GAMP, or Algorithm 3. Four metrics, the signal-to-noise ratio (SNR), the Hamming error, the number of missing nonzero coefficients, and the number of misidentified nonzero coefficients, respectively, are used to evaluate the quality of the reconstruction. More precisely, the signal-to-noise ratio (SNR) in dB is defined as

$$\text{SNR}(x,x^{\star}) = 20 \log_{10} \left(\left\|\frac{x}{\|x\|}\right\|_{2}/\left\|\frac{x}{\|x\|}-\frac{x^{\star}}{\|x^{\star}\|}\right\|_{2}\right); $$

the Hamming error is ∥y−sign(Φ x ^⋆)∥₀/m where m is the number of measurements; the number of missing nonzero coefficients refers to the number of nonzero coefficients that an algorithm “misses,” i.e., determines to be zero; the number of misidentified nonzero coefficients refers to the number of nonzero coefficients that are “misidentified,” i.e., coefficients that are determined to be nonzero when they should be zero. The last two metrics measure how well each algorithm finds the signal support, meaning the locations of the nonzero coefficients. A higher value of SNR indicates a better reconstructed signal. The smaller the values of the rest three metrics are the better the reconstructed signals will be. The accuracy of all test algorithms is measured by the average of values of these four metrics over 100 trials unless otherwise noted. For all figures in this section, results by the BIHT, RFPI, LP, GAMP, and Algorithm 3 with the Mangasarian function (17) and the Log-Det function (19) are marked by the symbols “$\triangledown $,” “ △,” “ ⊲,” “ ⊳,” “ ∘,” and “ ⋆,” respectively.

6.1 Effects of using inaccurate sparsity on the BIHT

The BIHT requires knowing the sparsity of the underlying signals. This requirement is, however, not known in practical applications. In this subsection, we demonstrate through numerical experiments that the mismatched sparsity for a signal will degenerate the performance of the BIHT.

To this end, we fix n=1000 and s=10 and consider two cases of m being 500 and 1000. For each case, we vary the sparsity input for the BIHT from 8 to 12 in which 10 is the only right choice. Therefore, there are total ten configurations. For each configuration, we record the SNR values of the reconstructed signals by the BIHT.

Figure 1 depicts the SNR values of the experiments. The plots in the left column of Fig. 1 are for the case m=500 while the plots in the right column are for the case m=1000. The marks in each plot represent the pairs of the SNR values with the correct sparsity input (i.e., s=10) and with a mismatched sparsity input (i.e., s=8, s=9, s=11, or s=12 corresponding to the row 1, 2, 3, or 4). A mark below the red line indicates that the BIHT with the correct sparsity input works better than the one with an incorrect sparsity input. A mark that is far away from the red line indicates the BIHT with the correct sparsity input works much better than the one with an incorrect sparsity input or vice versa. Except the second plot in the left column, we can see that the BIHT with the correct sparsity input performs better than the one with an inaccurate sparsity input. In particular, when an underestimated sparsity input to the BIHT is used, the performance of the BIHT will be significantly reduced (see the plots in the first two columns of Fig. 1). When an overestimated sparsity input to the BIHT is used, majority of the marks are under the red lines and are relatively closer to the red lines than those from the BIHT with underestimated sparsity input. We further report that the average SNR values for the sparsity input s=8, 9, 10, 11, and 12 for m=500 are 21.89, 24.18, 23.25, 22.10, and 21.00dB, respectively. Similarly, for m=1000, the average SNR values for the sparsity input s=8, 9, 10, 11, and 12 are 19.77dB, 26.37dB, 34.74dB, 31.12dB, and 29.46dB, respectively. In summary, we conclude that a proper chosen sparsity constraint is critical for the success of the BIHT.

6.2 Performance of Algorithm 3

Prior to applying Algorithm 3 for 1-bit compressive sampling problem, parameters k _max, τ, α _max, ε _min, α, and ε in Algorithm 3 need to be determined. Under the aforementioned setting for the random matrix Φ and sparse signal x, we fix k _max=13, $\tau =\frac {1}{2}$, α _max=8000, ε _min=10⁻⁴. For the functions F _ε defined by (17) and (19), we set the pair of initial parameters (α,ε) as (500,0.25) and (250,0.125), respectively. The iterative process in the PD subroutine is forced to stop if the corresponding number of iteration exceeds 300. These parameters are used in all simulations performed by Algorithm 3 in the rest of this section.

To evaluate the performance of Algorithm 3 at various scenarios, the following three configurations for n the dimension of the signal, m the number of measurements, and s the sparsity of the vector x, are considered:

configuration 1: n=1000, s=10, and m=100,500,1000,1500
configuration 2: m=1000, n=1000, and s=5,10,15,20
configuration 3: m=1000, s=10, and n=500,800,1200,1400

For every case in each configuration, we compare the accuracy of Algorithm 3 with the BIHT, RFPI, LP, and GAMP by computing the average of values of the four metrics over 100 trials. We remark that Algorithm 3, RFPI, LP, and GAMP do not require the knowledge of sparsity of original signals.

For the first configuration, Fig. 2 displays the average values of the four metrics by the BIHT, RFPI, LP, GAMP, and Algorithm 3 with the Mangasarian function (17) and the Log-Det function (19). Figure 2 a demonstrates that the GAMP performs best, the BIHT and Algorithm 3 perform similarly and exhibit much better performance than the LP and RFPI in terms of SNR values. As expected, the SNR value of the reconstruction from each algorithm increases as the number of measurements m increases. Figure 2 b depicts the consistency of the algorithms through Hamming error, that is, whether the signs of measurements of the reconstruction are the same as the signs of the original measurements. We can see that the Hamming errors generated by the BIHT, GAMP, and Algorithm 3 decrease towards to zeros as m increases. However, the Hamming errors from the LP and RFPI are always above zero. Figure 2 c, d is used to demonstrate how well each algorithm finds the signal support, meaning the locations of the nonzero coefficients. Figure 2 c depicts that the number of missed coefficients as a function of the ratio m/n is decreasing. From this plot, we can see that the GAMP performs best and the rest algorithms perform similarly, in particular, when the ratio m/n is larger than 1.5. However, Fig. 2 d depicts that the sparsity of the reconstructed signal from GAMP is higher than that from other algorithms. In summary, Algorithm 3 with the Mangasarian function and the Log-Det function performs as equally good as the BIHT in terms of the four metrics, in particular, when m/n is greater than 1, even though our algorithm does not require to know the exact sparsity of the original signal. We can also conclude that Algorithm 3 outperforms the RFPI for all metrics while the GAMP performs better than the other algorithms in terms of the metrics of SNR, the Hamming error, and the number of missing nonzero coefficients.

For the second configuration, the average values of the four metrics as a function of sparsity s are depicted in Fig. 3 for the BIHT, RFPI, LP, GAMP, and Algorithm 3 with fixed m=1000 and n=1000. Figure 3 a, b depicts that BIHT, GAMP, and Algorithm 3 outperform the RFPI and LP in terms of values of SNR and the Hamming error. Figure 3 c indicates that GAMP performs much better than the other algorithms in terms of the number of missing nonzero coefficients while Fig. 3 d indicates that GAMP performs much worse than the other algorithms in terms of the number of misidentified nonzero coefficients.

For the third configuration, the average values of the four metrics as a function of signal size s are depicted in Fig. 4 for the BIHT, RFPI, LP, GAMP, and Algorithm 3 with fixed m=1000 and s=10. The plots in Fig. 4 a–c indicate that the GAMP performs best, the BIHT and Algorithm 3 perform similarly and exhibit much better performance than the LP and RFPI in terms of values of SNR, the Hamming error, and the number of missing nonzero coefficients. Figure 4 d shows that the BIHT and Algorithm 3 outperform the other algorithms for all tested values of n in terms of the number of misidentified nonzero coefficients.

Finally, we compare the speed of the algorithms by measuring the average CPU time it takes each algorithm to produce the results showed in Figs. 2, 3, and 4. The experiments are performed under Windows 7 and Matlab 7.11 (R2010b) running on a laptop equipped with an Intel Core i5-2520M CPU at 2.50GHz and 4G RAM memory. When we implemented the BIHT, the number of iterations is set to 1500. The MATLAB command linprog was adopted in the implementation of LP. The source code of the GAMP was downloaded from the website of the first author of [10]. The source code of the RFPI was provided by the authors of [8]. The RFPI has two loops. The suggested number of outer-loop iterations is 20 while the number of inner-loop iterations is 200. The results of the experiments are depicted in Fig. 5. We find that both BIHT and GAMP are faster than the RFPI and Algorithm 3. The CPU time consumed by LP increases significantly, in particularly, when the size of the signal or the number of measurement increases.

7 Conclusions

In this paper, we proposed a new model and algorithm for 1-bit compressive sensing. The convergence analysis of the proposed algorithm was given. We demonstrated the performance of the algorithm for reconstruction from 1-bit measurements. In the future, it would be of interest to study the convergence of Algorithm 3 with the Mangasarian function. This result would be highly needed to adaptively update all the parameters in Algorithm 3 so that consistent reconstruction can be achieved with improved accuracy.

References

E Candes, J Romberg, T Tao, Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math.59(8), 1207–1223 (2006).
Article MathSciNet MATH Google Scholar
E Candes, T Tao, Near optimal signal recovery from random projections: universal encoding strategies?. IEEE Trans. Inf. Theory. 52(12), 5406–5425 (2006).
Article MathSciNet MATH Google Scholar
PT Boufounos, RG Baraniuk, in Proceedings of Conference on Information Science and Systems (CISS). 1-bit compressive sensing (IEEENJ, 2008), pp. 16–21.
Google Scholar
L Jacques, JN Laska, PT Boufounos, RG Baraniuk, Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Trans. Inf. Theory. 59:, 2082–2102 (2013).
Article MathSciNet Google Scholar
J Laska, Z Wen, W Yin, R Baraniuk, Trust, but verify: fast and accurate signal recovery from 1-bitcompressive measurements. IEEE Trans. Signal Process.59:, 5289–5301 (2011).
Article MathSciNet Google Scholar
Y Plan, R Vershynin, One-bit compressed sensing by linear programming. Commun. Pure Appl. Math.66:, 1275–1297 (2013).
Article MathSciNet MATH Google Scholar
M Yan, Y Yang, S Osher, Robust 1-bit compressive sensing using adaptive outlier pursuit. IEEE Trans. Signal Process.60:, 3868–3875 (2012).
Article MathSciNet Google Scholar
A Movahed, A Panahi, G Durisi, A robust RFPI-based 1-bit compressive sensing reconstruction algorithm. Information Theory Workshop (ITW), 2012 IEEE, Lausanne, 567–571 (2012). doi:10.1109/ITW.2012.6404739.
A Movahed, A Panahi, Reed MC, in IEEE International Conference On Acoustics, Speech, and Signal processing. Recovering signals with variable sparsity levels from the noisy 1-bit compressive measurements. (IEEE, 2014), pp. 6504–6508.
U Kamilov, A Bourquard, A Amini, M Unser, One-bit measurements with adaptive thresholds. IEEE Signal Process. Lett.19(10), 607–610 (2012).
Article Google Scholar
J-J Moreau, Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France. 93:, 273–299 (1965).
MathSciNet MATH Google Scholar
ET Hale, W Yin, Y Zhang, Fixed-point continuation for ℓ ₁ minimization: methodology and convergence. SIAM J. Optimization. 19:, 1107–1130 (2008).
Article MathSciNet MATH Google Scholar
T Blumensath, ME Davies, Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal.27:, 265–274 (2009).
Article MathSciNet MATH Google Scholar
OL Mangasarian, Minimum-support solutions of polyhedral concave programs. Optimization. 45:, 149–162 (1999).
Article MathSciNet MATH Google Scholar
M Fazel, H Hindi, S Boyd, Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices, American Control Conference. Proceedings of the 2003. 3:, 2156–2162 (2003). doi:10.1109/ACC.2003.1243393.
R Chartrand, Exact reconstruction of sparse signals via nonconvex minimization. Signal Process. Lett. IEEE. 14(10), 707–710 (2007).
Article Google Scholar
R Chartrand, V Staneva, Restricted isometry properties and nonconvex compressive sensing. Inverse Problems. 24(3), 035020 (2008).
Article MathSciNet MATH Google Scholar
L Chen, Y Gu, The convergence guarantees of a non-convex approach for sparse recovery. IEEE Trans. Signal Process.62(15), 3754–3767 (2014).
Article MathSciNet Google Scholar
M Hyder, K Mahata, An improved smoothed ℓ ₀ approximation algorithm for sparse representation. IEEE Trans. Signal Process.58(4), 2194–2205 (2010).
Article MathSciNet Google Scholar
H Mohimani, M Babaie-Zadeh, C Jutten, A fast approach for overcomplete sparse decomposition based on smoothed norm. Signal Process. IEEE Trans.57(1), 289–301 (2009).
Article MathSciNet MATH Google Scholar
R Saaba, O Yilmaz, Sparse recovery by non-convex optimization instance optimality. Appl. Comput. Harmonic Anal.29(1), 30–48 (2010).
Article MathSciNet MATH Google Scholar
S Jokar, ME Pfetsch, Exact and approximate sparse solutions of underdetermined linear equations. SIAM J. Sci. Comput.31(1), 23–44 (2008).
Article MathSciNet MATH Google Scholar
Y-B Zhao, D Li, Reweighted ℓ ₁-minimization for sparse solutions to underdetermined linear system.SIAM J. Optim.22(3), 1065–1088 (2012).
Article MathSciNet MATH Google Scholar
A Chambolle, T Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision. 40:, 120–145 (2011).
Article MathSciNet MATH Google Scholar
Q Li, CA Micchelli, L Shen, Y Xu, A proximity algorithm accelerated by Gauss-Seidel iterations for L1/TV denoising models. Inverse Problems. 28:, 095003 (2012).
Article MathSciNet MATH Google Scholar
X Zhang, M Burger, S Osher, A unified primal-dual algorithm framework based on Bregman iteration. J. Sci. Comput.46:, 20–46s (2011).
Article MathSciNet MATH Google Scholar
J-J Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien. C. R. Acad. Sci. Paris Sér. A Math.255:, 1897–2899 (1962).
MathSciNet MATH Google Scholar
HL Bauschke, PL Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, AMS Books in Mathematics (Springer, New York, 2011).
Book MATH Google Scholar
L Shen, BW Suter, Bounds for eigenvalues of arrowhead matrices and their applications to hub matrices and wireless communications. EURASIP J. Adv. Signal Process. doi:10.1155/2009/379402. Article ID 379402, 12 (2009).
KR Davidson, SJ Szarek, in Handbook of the Geometry of Babach Spaces, 1. Local operator theory, random matrices and Banach spaces (Elsevier ScienceAmsterdam: North-Holland, 2001), pp. 317–366.
Chapter Google Scholar

Download references

Acknowledgements

The authors would like to thank Ms. Na Zhang for her valuable comments and insightful suggestions which have brought improvements to several aspects of this manuscript. The authors would also like to thank Dr. Movahed for providing us his MATLAB source codes for the algorithm RFPI in [8].

The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of the Air Force Research Laboratory or the US Government.

This research is supported in part by an award from National Research Council via the Air Force Office of Scientific Research and by the US National Science Foundation under grant DMS-1522332.

Author information

Authors and Affiliations

Department of Mathematics, Syracuse University, Syracuse, 13244, NY, USA
Lixin Shen
Air Force Research Laboratory. AFRL/RITB, Rome, 13441-4505, NY, USA
Lixin Shen & Bruce W. Suter

Authors

Lixin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Bruce W. Suter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lixin Shen.

Additional information

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Shen, L., Suter, B.W. One-bit compressive sampling via ℓ ₀ minimization. EURASIP J. Adv. Signal Process. 2016, 71 (2016). https://doi.org/10.1186/s13634-016-0369-4

Download citation

Received: 07 July 2015
Accepted: 02 June 2016
Published: 14 June 2016
DOI: https://doi.org/10.1186/s13634-016-0369-4

One-bit compressive sampling via ℓ 0 minimization

Abstract

1 Introduction

2 Models for one-bit compressive sampling

Proposition 1.

Proof.

3 An algorithm for 1-bit compressive sampling

4 Convergence analysis

Theorem 2.

Proof.

Definition 3 (Range space property (RSP)).

Proposition 4.

Proof.

Lemma 5.

Proof.

Theorem 6.

Proof.

5 An implementation of Algorithm 1

Theorem 7.

Lemma 8.

Proof.

Proposition 9.

Proof.

6 Numerical simulations

6.1 Effects of using inaccurate sparsity on the BIHT

6.2 Performance of Algorithm 3

7 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

One-bit compressive sampling via ℓ ₀ minimization