2.1 Preliminaries
Let us introduce some notation. We denote matrices using bold capital-case letters (A,B), column vectors using bold-small case letters (x,y,z, etc.), and scalars using non-bold letters (R,m etc.). We use letters C and c to represent constants that are large enough and small enough respectively. We use x⊤ and A⊤ to denote the transpose of the vector x and matrix A respectively. The cardinality of set S is denoted by card(S). We define the signum function as \(\text {sgn}(x) := \frac {x}{|x|}\) for every \(x \in \mathbb {R}, x \neq 0\), with the convention that sgn(0)=1. The ith element of the vector \(\mathbf {x} \in \mathbb {R}^{n}\) is denoted by xi. Similarly, the ith row of the matrix \(\mathbf {A} \in \mathbb {R}^{m \times n}\) is denoted by ai, while the element of A in the ith row and jth column is denoted as aij.
2.2 Mathematical model
We consider the modulo operation within 2 periods (one in the positive half and one in the negative half). We assume that the value of dynamic range parameter R is large enough so that most of the measurements 〈ai,x∗〉 are covered within the domain of operation of modulo function. Rewriting in terms of the signum function, the (variation of) modulo function under consideration can be defined as:
$$f(t) := t+\left(\frac{1-\text{sgn}(t)}{2}\right)R. $$
One can easily notice that the modulo operation in this case is nothing but an addition of scalar R if the input is negative, while the non-negative inputs remain unaffected by it. If we divide the number line in these two bins, then the coefficient of R in above equation can be seen as a bin-index, a binary variable which takes value 0 when sgn(t)=1, or 1 when sgn(t)=−1. Inserting the definition of f in the measurement model of Eq. 1 gives,
$$ y_{i}= \langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle+\left(\frac{1-\text{sgn}(\langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle)}{2}\right)R,~~i = \{1,..,m\}. $$
(2)
We can rewrite Eq. 2 using a bin-index vector p∈{0,1}m. Each element of the true bin-index vector p∗ is given as,
$$p^{*}_{i} = \frac{1-\text{sgn}\left(\langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle\right)}{2},~~i = \{1,..,m\}. $$
If we ignore the presence of the modulo operation in the above formulation, then it reduces to a standard compressive sensing reconstruction problem. In that case, the compressed measurements \(y_{c_{i}}\) would just be equal to 〈ai,x∗〉. While we have access only to the compressed modulo measurements y, it is useful to write y in terms of true compressed measurements yc. Thus,
$$\begin{aligned} y_{i} &= {\langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle} + p^{*}_{i}R \\ &= y_{c_{i}}+p^{*}_{i}R. \end{aligned} $$
It is evident that if we can recover p∗ successfully, we can calculate the true compressed measurements 〈ai, x∗〉 and use them to reconstruct x∗ with any sparse recovery algorithm such as CoSaMP [19] or basis pursuit [20–22].
2.3 Signal recovery
The major barrier to signal recovery is that the bin-index vector is unknown. In this section, we describe our algorithm to recover both x∗ and p∗, given the modulo measurements y, measurement matrix A, sparsity of underlying signal s, and the modulo parameter R. In this work, we rely on the assumption that our signal is sparse in a known domain with sparsity being s. Our algorithm MoRAM (Modulo Reconstruction with Alternating Minimization) comprises two stages: (i) an bin-index initialization stage and (ii) a descent stage via alternating minimization.
2.3.1 Bin-index initialization
As stated earlier, if we recover true bin-index p∗ successfully, x∗ can be recovered easily using any sparse recovery algorithm as we can obtain the true compressed measurements 〈ai,x∗〉 from p∗. Thus, in the absence of p∗, we propose to estimate a fraction of the values from the p∗ correctly. To understand the rationale for such a procedure, we will first try to understand the effect of the modulo operation on the linear measurements.
2.3.2 Effect of the modulo transfer function
To provide some intuition, let us first examine the relation between the distributions of Ax∗ and mod (Ax∗). It is easy to see that the compressed measurements yc follow a normal distribution.
We can now divide the compressed observations yc into two sets: yc,+, which contains all the non-negative observations with bin-index =0, and yc,−, which contains all the negative observations with bin-index =1. As shown in Fig. 2, after the modulo operation, the set yc,− (green) shifts to the right by R and gets concentrated in the right half ([R/2,R]), while the set yc,+ (orange) remains unaffected and concentrated in the left half ([0,R/2]). Thus, for some of the modulo measurements, their correct bin-index can be identified by observing their magnitudes relative to the midpoint R/2. This leads us to the following estimator for bin-indices (p):
$$ {p}^{0}_{i} = \left\{\begin{array}{ll} 0,& \text{if}\ 0\leq y_{i} < R/2,\\ 1,& \text{if}\ R/2 \leq y_{i} \leq R. \end{array}\right. $$
(3)
The vector p0 obtained with the above method contains the correct values of bin-indices for many of the measurements, except for the ones concentrated within the ambiguous region in the center. We should highlight that the procedure in Eq. 3 will succeed only for the specific case of modulo fold operations limited to two periods, one for the positive and one for the negative cycle.
Once we identify the initial values of bin-index for the modulo measurements, we can calculate corrected measurements as:
$$\begin{array}{*{20}l} \mathbf{y^{0}_{c} = y - p^{0}}R. \end{array} $$
(4)
2.3.3 Alternating minimization
Using Eq. 3, we calculate the initial estimate of the bin-index p0 in which significant fraction of the total values are estimated correctly. Starting with p0, we calculate the estimates of x and p in an alternating fashion to converge to the original signal x∗ and true bin-index p∗.
With pt being close to p∗, we would calculate the correct compressed measurements \(\mathbf {y^{t}_{c}}\) using pt and use \(\mathbf {y^{t}_{c}}\) with any popular compressive recovery algorithms (such as CoSaMP, or basis pursuit) to calculate the signal estimate xt. Therefore:
$$\begin{array}{*{20}l} \mathbf{y^{t}_{c}} &= \mathbf{y} - \mathbf{p^{t}}R, \end{array} $$
(5)
$$ \begin{array}{*{20}l} \mathbf{{x}^{t}} &= \underset{\mathbf{x} \in \mathcal{M}_{s}}{\arg\min}\|{\mathbf{Ax} - \mathbf{y^{t}_{c}}}\|_{2}^{2}, \end{array} $$
(6)
where \(\mathcal {M}_{s}\) denotes the set of s-sparse vectors in \(\mathbb {R}^{n}\). Note that sparsity is only one of several signal models that can be used here, and a rather similar formulation would extend to cases where \(\mathcal {M}\) denotes any other structured sparsity model [23, 24].
However, the bin-index estimation error, dt=pt−p∗, even if small, would significantly impact the correction step that constructs \(\mathbf {y^{t}_{c}}\) since each incorrect bin-index would add a noise of the magnitude R in \(\mathbf {y^{t}_{c}}\). Our experiments suggest that the typical sparse recovery algorithms are not robust enough to cope up with such large errors in \(\mathbf {y^{t}_{c}}\). To tackle this issue, we employ an outlier-robust sparse recovery method known as Justice Pursuit [10].
At a high level, Justice Pursuit tackles the problem of sparse signal recovery from the measurements that are corrupted by a sparse but large (unbounded) corruptions. Justice Pursuit leverages the fact that the corruptions are also sparse, and reformulates the problem to recover both the sparse signal and sparse corruptions together in the form of a concatenated sparse vector. In our case, the error dt is sparse with sparsity sdt=∥dt∥0, and each erroneous element of p adds a corruption of magnitude R in \(\mathbf {y^{t}_{c}}\). Following [10], we augment the measurement matrix A with an identity matrix Im×m and introduce an intermediate vector \(\mathbf {u} \in \mathbb {R}^{n+m}\) to represent our measurements at iteration t as:
$$ \mathbf{Ax^{*}} + R\mathbf{I_{m} d^{t}} = \left[\begin{array}{l} \mathbf{A} ~~~~R\mathbf{I} \end{array}\right] \left[\begin{array}{l} \mathbf{x^{*}} \\ \mathbf{d^{t}} \end{array}\right] = \left[\begin{array}{l} \mathbf{A} ~~ ~~R\mathbf{I} \end{array}\right] \mathbf{u}, $$
(7)
and solve for the (s+sdt)−sparse estimate \(\mathbf {\widehat {u}}\):
$$ \left[\begin{array}{l} \mathbf{\widehat{x}^{t}} \\ \mathbf{\widehat{d}^{t}} \end{array}\right] = \mathbf{\widehat{u}} = \underset{\mathbf{u}}{\arg\min} \|{\mathbf{u}}\|_{1}~~~s.t. \left[\begin{array}{l} \mathbf{A} ~~~~ R\mathbf{I} \end{array}\right] \mathbf{u} = \mathbf{y^{t}_{c}} $$
(8)
Here, the signal estimate \(\mathbf {\widehat {x}^{t}}\) is obtained by selecting the first n elements of \(\mathbf {\widehat {u}}\), while an estimate of the corruptions can be obtained by selecting the last m elements of \(\mathbf {\widehat {u}}\). The problem in Eq. 8 can be solved by any stable sparse recovery algorithm such as CoSaMP or IHT; however, note that the sparsity of dt is unknown, suggesting that greedy sparse recovery methods cannot be directly used without an additional hyper-parameter. Therefore, we employ basis pursuit [25] which does not heavily depend on a priori knowledge of the sparsity level.
We refer to the routine that solves the program in Eq. 8 using basis pursuit as JP. Given \(\mathbf {A, y^{t}_{c}}\), JP returns xt. Thus,
$$ \mathbf{x^{t+1}}= JP \left(\mathbf{A}, \mathbf{y^{t}_{c}} \right). $$
(9)
Once the signal estimate xt is obtained at each iteration of alternating minimization, we use it to calculate the value of the bin-index vector pt+1 as follows:
$$ \mathbf{{p}^{t+1}} = \frac{\mathbf{1}-\text{sgn}\left(\mathbf{A}\mathbf{x^{t}} \right)}{2}. $$
(10)
Proceeding this way, we repeat the steps of sparse recovery (Eq. 8) and bin-index calculation (Eq. 10) in alternating fashion for T iterations. Under certain conditions (described in Section 2.4 below), our algorithm is able to achieve convergence to the true underlying signal.
2.4 Mathematical analysis
In this section, we provide correctness proofs for both steps of Algorithm 1. For the first stage, we derive an upper bound on the number of incorrect estimations in p0 obtained in the bin-index initialization step. This upper bound essentially provides an upper bound on the permissible sparsity of d0. For the second stage, we calculate a sufficient number of measurements required such that the augmented matrix used in the Justice Pursuit formulation in (8) satisfies the Restricted Isometry Property (RIP), which would in turn enable a recovery guarantee.
2.5 Bin-index initialization
In this step, we initialize the bin-index vector p0 according to Eq. 3. We can also quantify the number of correctly estimated bin-indices by calculating the area under the curve of the density plots of the measurements before and after the modulo operation. An illustration is provided in Fig. 3.
In this analysis, our goal is to characterize the distribution of total number of measurements for which we can estimate the correct bin-index through Eq. 3. Such a random variable is denoted by Mc. From Mc, we can calculate the sparsity of d0 as ∥d0∥0=m−Mc. The following lemma presents a bound on the sparsity of d0.
Lemma 1
Let the entries of the measurement matrix be generated as \(\mathbf {A}_{ij} \sim \mathcal {N}(0,1/m)\), and y be the modulo measurements obtained as per Eq. 1. Let Mc the random variable denoting the number of measurements for which the correct bin-indices are identified in the initialization method provided in Eq. 3. Then, with probability at least \(1 - e^{-O(m\delta ^{2})}\):
$$M_{c} > (1 - \delta)m \left(1-2\frac{\sigma^{2}\phi(R/2)}{(R/2)} \right). $$
Here, ϕ(·) is a Gaussian density with mean μ=0 and variance \(\sigma ^{2} = \|{\mathbf {x^{*}}}\|^{2}_{2}\).
Proof
Observe that each element of A is i.i.d. standard normal, i.e., \(\mu _{A_{ij}} = 0\) and \(\sigma ^{2}_{A_{ij}}=1\). Recall that
$$y_{c,i} =\langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle = \sum_{j=1}^{n}A_{ij}x^{*}_{j}. $$
Therefore, we have
$$y_{c,i} \sim \mathcal{N}\left(\mu= \sum_{j=1}^{n}x^{*}\mu_{A_{ij}} = 0, \sigma^{2} =\sum_{j=1}^{n}x^{*2}_{j}\sigma^{2}_{A_{ij}}\right). $$
Thus, each element of yc follows a zero-mean Gaussian distribution with variance σ2. Let Ei be the event that the random variable yc,i lies in the interval [−R/2,R/2]; this event indicates that the corresponding measurement is appropriately corrected using Eq. 4 Clearly, Ei is a Bernoulli random variable with probability q=P[−R/2≤yc,i≤R/2]. Elementary probability calculations give us:
$$\begin{array}{*{20}l} q = 1 - 2 Q_{0, \sigma^{2}}(R/2), \end{array} $$
where \(\phantom {\dot {i}\!}Q_{0, \sigma ^{2}}(\cdot)\) is the usual Q-function. This is not calculable in closed form; however, it can be lower bounded using the following identity (where \(\phantom {\dot {i}\!}\phi _{0, \sigma ^{2}}(\cdot)\) is a Gaussian density function with mean zero and variance σ2:
$$Q_{0, \sigma^{2}}(t) < \sigma^{2} \frac{\phi_{0, \sigma^{2}}(t)}{t}. $$
The random variable \(M_{c} = \sum _{i=1}^{m} {E_{i}}\) denotes the number of corrected measurements. By an application of the Chernoff bound,
$$\begin{array}{*{20}l} P\left(M_{c} \leq (1 - \delta)\mu'\right) \leq e^{-\mu'\delta^{2}/2}, \end{array} $$
for any δ∈(0,1),where μ′ is the mean of Mc. Plugging in μ′=mq gives the desired result. □
We now perform a theoretical analysis of the descent stage of our algorithm. We assume the availability of an initial estimate of bin-index vector p0 that is close to p∗. In our case, our initialization step (in Alg. 1) provides such p0.
We perform alternating minimization (AltMin) as described in 1, starting with p0 calculated using Eq. 3. For simplicity, we limit our analysis of the convergence to only one AltMin iteration. In fact, according to our theoretical analysis, if initialized well enough, one iteration of AltMin suffices for exact signal recovery with sufficiently many measurements; however, in practice, we have observed that our algorithm performs better with multiple AltMin iterations.
Theorem 2
Given the initial estimate of bin-index p0 obtained using Eq. 3, if the number of modulo measurements m satisfies:
$$\begin{array}{*{20}l} m \geq C_{1}\left(\|{\mathbf{x^{*}}}\|_{0} + m(1 - U + \delta U)\right) \log\left(\frac{n + m}{\|{\mathbf{x^{*}}}\|_{0} + m\left(1 - U + \delta U\right)}\right), \end{array} $$
then the first iteration of Algorithm 1 returns the true signal x0 with probability exceeding \(1 - e^{-O(m\delta ^{2})}\) with small δ>0. Here, C1 depends only on the RIP constant for the augmented measurement matrix [A I], \(\phantom {\dot {i}\!}q = 1 - 2 Q_{0, \sigma ^{2}}(R/2)\), and \(U = 1-2\sigma ^{2}\frac {\phi (R/2)}{(R/2)}\).
Proof
In the estimation step, Algorithm 1 recasts the problem of recovering the true signal x∗ as a special case of sparse signal recovery from sparsely corrupted compressive measurements. The presence of modulo operation modifies the compressive measurements by adding a constant noise of the value R in fraction of total measurements. However, once we identify correct bin-index for some of the measurements using Eq. 3, the remaining noise can be modeled as sparse corruptions d according to the formulation:
$$\mathbf{y^{0}_{c}} = \mathbf{Ax^{*}} + \mathbf{I_{m}R\left(p^{0}-p^{*}\right)} = \mathbf{Ax^{*}} + \mathbf{d^{0}}. $$
Here, the ℓ0-norm of d0 gives us the number of noisy measurements in \(\mathbf {y^{0}_{c}}\).
If the initial bin-index vector p0 is close to the true bin-index vector p∗, then ∥d0∥0 is small enough with respect to total number of measurements m; thus, d0 can be treated as sparse corruption. If we model this corruption as a sparse noise, then we can employ JP for a guaranteed recovery of the true signal given sufficiently large number of measurements are available. Denote ∥d0∥0=m−Mc as number of measurements for which the bin-index estimates were incorrect. Then, using Lemma 1, with probability at least \(1 - e^{-O(m\delta ^{2})}\):
$$\begin{array}{*{20}l} \|{\mathbf{d^{0}}}\|_{0} & \leq m -(1 - \delta)mU \\ & \leq m\left(1 - U + \delta U\right),~~\text{with}\ U = \left(1-2\sigma^{2}\frac{\phi(R/2)}{(R/2)} \right). \\ \end{array} $$
Algorithm 1 is essentially the Justice Pursuit (JP) formulation as described in [10]. Exact signal recovery from sparsely corrupted measurements is a well-studied problem with uniform recovery guarantees available in the existing literature. We use the guarantee proved in [10] for Gaussian observations, which states that provided enough measurements, the augmented matrix [A I] satisfies the Restricted Isometry Property. As stated in [26], one can recover a sparse signal exactly by tractable ℓ1-minimization if the measurement matrix is known to satisfy the RIP. Thus, provided m≥C(∥x∗∥0+∥d0∥0) log((n+m)/(∥x∗∥0+∥d0∥0)), we invoke Theorem 1.1 from [10] and replace ∥d0∥0 with m(1−U+δU) as stated above to complete the proof. □
From the theorem, we see that the number of measurements required for guaranteed recovery depends on the ratio of σ (standard deviation of the measurements) and R. In practical applications, choosing a sufficiently large R such that the interval [−R,R] covers multiple standard deviations on both sides of origin enables successful recovery.