### 2.1 Preliminaries

Let us introduce some notation. We denote matrices using bold capital-case letters (**A****,****B**), column vectors using bold-small case letters (**x****,****y****,****z**, etc.), and scalars using non-bold letters (*R*,*m* etc.). We use letters *C* and *c* to represent constants that are large enough and small enough respectively. We use **x**^{⊤} and **A**^{⊤} to denote the transpose of the vector **x** and matrix **A** respectively. The cardinality of set *S* is denoted by card(*S*). We define the signum function as \(\text {sgn}(x) := \frac {x}{|x|}\) for every \(x \in \mathbb {R}, x \neq 0\), with the convention that sgn(0)=1. The *i*th element of the vector \(\mathbf {x} \in \mathbb {R}^{n}\) is denoted by *x*_{i}. Similarly, the *i*th row of the matrix \(\mathbf {A} \in \mathbb {R}^{m \times n}\) is denoted by *a*_{i}, while the element of **A** in the *i*th row and *j*th column is denoted as *a*_{ij}.

### 2.2 Mathematical model

We consider the modulo operation within 2 periods (one in the positive half and one in the negative half). We assume that the value of dynamic range parameter *R* is large enough so that most of the measurements 〈*a*_{i},*x*^{∗}〉 are covered within the domain of operation of modulo function. Rewriting in terms of the signum function, the (variation of) modulo function under consideration can be defined as:

$$f(t) := t+\left(\frac{1-\text{sgn}(t)}{2}\right)R. $$

One can easily notice that the modulo operation in this case is nothing but an addition of scalar *R* if the input is negative, while the non-negative inputs remain unaffected by it. If we divide the number line in these two bins, then the coefficient of *R* in above equation can be seen as a bin-index, a binary variable which takes value 0 when sgn(*t*)=1, or 1 when sgn(*t*)=−1. Inserting the definition of *f* in the measurement model of Eq. 1 gives,

$$ y_{i}= \langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle+\left(\frac{1-\text{sgn}(\langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle)}{2}\right)R,~~i = \{1,..,m\}. $$

(2)

We can rewrite Eq. 2 using a bin-index vector **p**∈{0,1}^{m}. Each element of the true bin-index vector **p**^{∗} is given as,

$$p^{*}_{i} = \frac{1-\text{sgn}\left(\langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle\right)}{2},~~i = \{1,..,m\}. $$

If we ignore the presence of the modulo operation in the above formulation, then it reduces to a standard compressive sensing reconstruction problem. In that case, the compressed measurements \(y_{c_{i}}\) would just be equal to 〈*a*_{i},*x*^{∗}〉. While we have access only to the compressed modulo measurements **y**, it is useful to write **y** in terms of true compressed measurements **y**_{c}. Thus,

$$\begin{aligned} y_{i} &= {\langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle} + p^{*}_{i}R \\ &= y_{c_{i}}+p^{*}_{i}R. \end{aligned} $$

It is evident that if we can recover *p*^{∗} successfully, we can calculate the true compressed measurements 〈*a*_{i}, *x*^{∗}〉 and use them to reconstruct *x*^{∗} with any sparse recovery algorithm such as CoSaMP [19] or basis pursuit [20–22].

### 2.3 Signal recovery

The major barrier to signal recovery is that the bin-index vector is unknown. In this section, we describe our algorithm to recover both *x*^{∗} and *p*^{∗}, given the modulo measurements **y**, measurement matrix **A**, sparsity of underlying signal *s*, and the modulo parameter *R*. In this work, we rely on the assumption that our signal is sparse in a known domain with sparsity being *s*. Our algorithm *MoRAM* (Modulo Reconstruction with Alternating Minimization) comprises two stages: (i) an bin-index initialization stage and (ii) a descent stage via alternating minimization.

#### 2.3.1 Bin-index initialization

As stated earlier, if we recover true bin-index *p*^{∗} successfully, *x*^{∗} can be recovered easily using any sparse recovery algorithm as we can obtain the true compressed measurements 〈*a*_{i},*x*^{∗}〉 from *p*^{∗}. Thus, in the absence of *p*^{∗}, we propose to estimate a fraction of the values from the *p*^{∗} correctly. To understand the rationale for such a procedure, we will first try to understand the effect of the modulo operation on the linear measurements.

#### 2.3.2 Effect of the modulo transfer function

To provide some intuition, let us first examine the relation between the distributions of **A***x*^{∗} and mod (**A***x*^{∗}). It is easy to see that the compressed measurements *y*_{c} follow a normal distribution.

We can now divide the compressed observations *y*_{c} into two sets: *y*_{c,+}, which contains all the non-negative observations with bin-index =0, and *y*_{c,−}, which contains all the negative observations with bin-index =1. As shown in Fig. 2, after the modulo operation, the set **y**_{c,−} (green) shifts to the right by *R* and gets concentrated in the right half ([*R*/2,*R*]), while the set **y**_{c,+} (orange) remains unaffected and concentrated in the left half ([0,*R*/2]). Thus, for some of the modulo measurements, their correct bin-index can be identified by observing their magnitudes relative to the midpoint *R*/2. This leads us to the following estimator for bin-indices (**p**):

$$ {p}^{0}_{i} = \left\{\begin{array}{ll} 0,& \text{if}\ 0\leq y_{i} < R/2,\\ 1,& \text{if}\ R/2 \leq y_{i} \leq R. \end{array}\right. $$

(3)

The vector *p*^{0} obtained with the above method contains the correct values of bin-indices for many of the measurements, except for the ones concentrated within the ambiguous region in the center. We should highlight that the procedure in Eq. 3 will succeed only for the specific case of modulo fold operations limited to two periods, one for the positive and one for the negative cycle.

Once we identify the initial values of bin-index for the modulo measurements, we can calculate corrected measurements as:

$$\begin{array}{*{20}l} \mathbf{y^{0}_{c} = y - p^{0}}R. \end{array} $$

(4)

#### 2.3.3 Alternating minimization

Using Eq. 3, we calculate the initial estimate of the bin-index *p*^{0} in which significant fraction of the total values are estimated correctly. Starting with *p*^{0}, we calculate the estimates of **x** and **p** in an alternating fashion to converge to the original signal *x*^{∗} and true bin-index *p*^{∗}.

With *p*^{t} being close to *p*^{∗}, we would calculate the correct compressed measurements \(\mathbf {y^{t}_{c}}\) using *p*^{t} and use \(\mathbf {y^{t}_{c}}\) with any popular compressive recovery algorithms (such as CoSaMP, or basis pursuit) to calculate the signal estimate *x*^{t}. Therefore:

$$\begin{array}{*{20}l} \mathbf{y^{t}_{c}} &= \mathbf{y} - \mathbf{p^{t}}R, \end{array} $$

(5)

$$ \begin{array}{*{20}l} \mathbf{{x}^{t}} &= \underset{\mathbf{x} \in \mathcal{M}_{s}}{\arg\min}\|{\mathbf{Ax} - \mathbf{y^{t}_{c}}}\|_{2}^{2}, \end{array} $$

(6)

where \(\mathcal {M}_{s}\) denotes the set of *s*-sparse vectors in \(\mathbb {R}^{n}\). Note that sparsity is only one of several signal models that can be used here, and a rather similar formulation would extend to cases where \(\mathcal {M}\) denotes any other structured sparsity model [23, 24].

However, the bin-index estimation error, *d*^{t}=*p*^{t}**−***p*^{∗}, even if small, would significantly impact the correction step that constructs \(\mathbf {y^{t}_{c}}\) since each incorrect bin-index would add a noise of the magnitude *R* in \(\mathbf {y^{t}_{c}}\). Our experiments suggest that the typical sparse recovery algorithms are not robust enough to cope up with such large errors in \(\mathbf {y^{t}_{c}}\). To tackle this issue, we employ an outlier-robust sparse recovery method known as Justice Pursuit [10].

At a high level, Justice Pursuit tackles the problem of sparse signal recovery from the measurements that are corrupted by a sparse but large (unbounded) corruptions. Justice Pursuit leverages the fact that the corruptions are also sparse, and reformulates the problem to recover both the sparse signal and sparse corruptions together in the form of a concatenated sparse vector. In our case, the error *d*^{t} is sparse with sparsity *s*_{dt}=∥*d*^{t}∥_{0}, and each erroneous element of **p** adds a corruption of magnitude *R* in \(\mathbf {y^{t}_{c}}\). Following [10], we augment the measurement matrix **A** with an identity matrix **I**_{m×m} and introduce an intermediate vector \(\mathbf {u} \in \mathbb {R}^{n+m}\) to represent our measurements at iteration *t* as:

$$ \mathbf{Ax^{*}} + R\mathbf{I_{m} d^{t}} = \left[\begin{array}{l} \mathbf{A} ~~~~R\mathbf{I} \end{array}\right] \left[\begin{array}{l} \mathbf{x^{*}} \\ \mathbf{d^{t}} \end{array}\right] = \left[\begin{array}{l} \mathbf{A} ~~ ~~R\mathbf{I} \end{array}\right] \mathbf{u}, $$

(7)

and solve for the (*s*+*s*_{dt})−sparse estimate \(\mathbf {\widehat {u}}\):

$$ \left[\begin{array}{l} \mathbf{\widehat{x}^{t}} \\ \mathbf{\widehat{d}^{t}} \end{array}\right] = \mathbf{\widehat{u}} = \underset{\mathbf{u}}{\arg\min} \|{\mathbf{u}}\|_{1}~~~s.t. \left[\begin{array}{l} \mathbf{A} ~~~~ R\mathbf{I} \end{array}\right] \mathbf{u} = \mathbf{y^{t}_{c}} $$

(8)

Here, the signal estimate \(\mathbf {\widehat {x}^{t}}\) is obtained by selecting the first *n* elements of \(\mathbf {\widehat {u}}\), while an estimate of the corruptions can be obtained by selecting the last *m* elements of \(\mathbf {\widehat {u}}\). The problem in Eq. 8 can be solved by any stable sparse recovery algorithm such as CoSaMP or IHT; however, note that the sparsity of *d*^{t} is unknown, suggesting that greedy sparse recovery methods cannot be directly used without an additional hyper-parameter. Therefore, we employ basis pursuit [25] which does not heavily depend on a priori knowledge of the sparsity level.

We refer to the routine that solves the program in Eq. 8 using basis pursuit as *JP*. Given \(\mathbf {A, y^{t}_{c}}\), JP returns *x*^{t}. Thus,

$$ \mathbf{x^{t+1}}= JP \left(\mathbf{A}, \mathbf{y^{t}_{c}} \right). $$

(9)

Once the signal estimate *x*^{t} is obtained at each iteration of alternating minimization, we use it to calculate the value of the bin-index vector *p*^{t+1} as follows:

$$ \mathbf{{p}^{t+1}} = \frac{\mathbf{1}-\text{sgn}\left(\mathbf{A}\mathbf{x^{t}} \right)}{2}. $$

(10)

Proceeding this way, we repeat the steps of sparse recovery (Eq. 8) and bin-index calculation (Eq. 10) in alternating fashion for T iterations. Under certain conditions (described in Section 2.4 below), our algorithm is able to achieve convergence to the true underlying signal.

### 2.4 Mathematical analysis

In this section, we provide correctness proofs for both steps of Algorithm 1. For the first stage, we derive an upper bound on the number of incorrect estimations in *p*^{0} obtained in the bin-index initialization step. This upper bound essentially provides an upper bound on the permissible sparsity of *d*^{0}. For the second stage, we calculate a sufficient number of measurements required such that the augmented matrix used in the Justice Pursuit formulation in (8) satisfies the Restricted Isometry Property (RIP), which would in turn enable a recovery guarantee.

### 2.5 Bin-index initialization

In this step, we initialize the bin-index vector *p*^{0} according to Eq. 3. We can also quantify the number of correctly estimated bin-indices by calculating the area under the curve of the density plots of the measurements before and after the modulo operation. An illustration is provided in Fig. 3.

In this analysis, our goal is to characterize the distribution of total number of measurements for which we can estimate the correct bin-index through Eq. 3. Such a random variable is denoted by *M*_{c}. From *M*_{c}, we can calculate the sparsity of *d*^{0} as ∥*d*^{0}∥_{0}=*m*−*M*_{c}. The following lemma presents a bound on the sparsity of *d*^{0}.

###
**Lemma 1**

Let the entries of the measurement matrix be generated as \(\mathbf {A}_{ij} \sim \mathcal {N}(0,1/m)\), and **y** be the modulo measurements obtained as per Eq. 1. Let *M*_{c} the random variable denoting the number of measurements for which the correct bin-indices are identified in the initialization method provided in Eq. 3. Then, with probability at least \(1 - e^{-O(m\delta ^{2})}\):

$$M_{c} > (1 - \delta)m \left(1-2\frac{\sigma^{2}\phi(R/2)}{(R/2)} \right). $$

Here, *ϕ*(·) is a Gaussian density with mean *μ*=0 and variance \(\sigma ^{2} = \|{\mathbf {x^{*}}}\|^{2}_{2}\).

###
*Proof*

Observe that each element of **A** is i.i.d. standard normal, i.e., \(\mu _{A_{ij}} = 0\) and \(\sigma ^{2}_{A_{ij}}=1\). Recall that

$$y_{c,i} =\langle \mathbf{a_{i}}, \mathbf{x^{*}} \rangle = \sum_{j=1}^{n}A_{ij}x^{*}_{j}. $$

Therefore, we have

$$y_{c,i} \sim \mathcal{N}\left(\mu= \sum_{j=1}^{n}x^{*}\mu_{A_{ij}} = 0, \sigma^{2} =\sum_{j=1}^{n}x^{*2}_{j}\sigma^{2}_{A_{ij}}\right). $$

Thus, each element of *y*_{c} follows a zero-mean Gaussian distribution with variance *σ*^{2}. Let *E*_{i} be the event that the random variable *y*_{c,i} lies in the interval [−*R*/2,*R*/2]; this event indicates that the corresponding measurement is appropriately corrected using Eq. 4 Clearly, *E*_{i} is a Bernoulli random variable with probability *q*=*P*[−*R*/2≤*y*_{c,i}≤*R*/2]. Elementary probability calculations give us:

$$\begin{array}{*{20}l} q = 1 - 2 Q_{0, \sigma^{2}}(R/2), \end{array} $$

where \(\phantom {\dot {i}\!}Q_{0, \sigma ^{2}}(\cdot)\) is the usual Q-function. This is not calculable in closed form; however, it can be lower bounded using the following identity (where \(\phantom {\dot {i}\!}\phi _{0, \sigma ^{2}}(\cdot)\) is a Gaussian density function with mean zero and variance *σ*^{2}:

$$Q_{0, \sigma^{2}}(t) < \sigma^{2} \frac{\phi_{0, \sigma^{2}}(t)}{t}. $$

The random variable \(M_{c} = \sum _{i=1}^{m} {E_{i}}\) denotes the number of corrected measurements. By an application of the Chernoff bound,

$$\begin{array}{*{20}l} P\left(M_{c} \leq (1 - \delta)\mu'\right) \leq e^{-\mu'\delta^{2}/2}, \end{array} $$

for any *δ*∈(0,1),where *μ*^{′} is the mean of *M*_{c}. Plugging in *μ*^{′}=*m**q* gives the desired result. □

We now perform a theoretical analysis of the descent stage of our algorithm. We assume the availability of an initial estimate of bin-index vector *p*^{0} that is close to *p*^{∗}. In our case, our initialization step (in Alg. 1) provides such *p*^{0}.

We perform alternating minimization (AltMin) as described in 1, starting with *p*^{0} calculated using Eq. 3. For simplicity, we limit our analysis of the convergence to only one AltMin iteration. In fact, according to our theoretical analysis, if initialized well enough, one iteration of AltMin suffices for exact signal recovery with sufficiently many measurements; however, in practice, we have observed that our algorithm performs better with multiple AltMin iterations.

###
**Theorem 2**

Given the initial estimate of bin-index *p*^{0} obtained using Eq. 3, if the number of modulo measurements *m* satisfies:

$$\begin{array}{*{20}l} m \geq C_{1}\left(\|{\mathbf{x^{*}}}\|_{0} + m(1 - U + \delta U)\right) \log\left(\frac{n + m}{\|{\mathbf{x^{*}}}\|_{0} + m\left(1 - U + \delta U\right)}\right), \end{array} $$

then the first iteration of Algorithm 1 returns the true signal *x*^{0} with probability exceeding \(1 - e^{-O(m\delta ^{2})}\) with small *δ*>0. Here, *C*_{1} depends only on the RIP constant for the augmented measurement matrix [**A** **I**], \(\phantom {\dot {i}\!}q = 1 - 2 Q_{0, \sigma ^{2}}(R/2)\), and \(U = 1-2\sigma ^{2}\frac {\phi (R/2)}{(R/2)}\).

###
*Proof*

In the estimation step, Algorithm 1 recasts the problem of recovering the true signal *x*^{∗} as a special case of sparse signal recovery from sparsely corrupted compressive measurements. The presence of modulo operation modifies the compressive measurements by adding a constant noise of the value *R* in fraction of total measurements. However, once we identify correct bin-index for some of the measurements using Eq. 3, the remaining noise can be modeled as sparse corruptions **d** according to the formulation:

$$\mathbf{y^{0}_{c}} = \mathbf{Ax^{*}} + \mathbf{I_{m}R\left(p^{0}-p^{*}\right)} = \mathbf{Ax^{*}} + \mathbf{d^{0}}. $$

Here, the *ℓ*_{0}-norm of *d*^{0} gives us the number of noisy measurements in \(\mathbf {y^{0}_{c}}\).

If the initial bin-index vector *p*^{0} is close to the true bin-index vector *p*^{∗}, then ∥*d*^{0}∥_{0} is small enough with respect to total number of measurements *m*; thus, *d*^{0} can be treated as sparse corruption. If we model this corruption as a sparse noise, then we can employ JP for a guaranteed recovery of the true signal given sufficiently large number of measurements are available. Denote ∥*d*^{0}∥_{0}=*m*−*M*_{c} as number of measurements for which the bin-index estimates were incorrect. Then, using Lemma 1, with probability at least \(1 - e^{-O(m\delta ^{2})}\):

$$\begin{array}{*{20}l} \|{\mathbf{d^{0}}}\|_{0} & \leq m -(1 - \delta)mU \\ & \leq m\left(1 - U + \delta U\right),~~\text{with}\ U = \left(1-2\sigma^{2}\frac{\phi(R/2)}{(R/2)} \right). \\ \end{array} $$

Algorithm 1 is essentially the Justice Pursuit (JP) formulation as described in [10]. Exact signal recovery from sparsely corrupted measurements is a well-studied problem with uniform recovery guarantees available in the existing literature. We use the guarantee proved in [10] for Gaussian observations, which states that provided enough measurements, the augmented matrix [**A** **I**] satisfies the Restricted Isometry Property. As stated in [26], one can recover a sparse signal exactly by tractable *ℓ*_{1}-minimization if the measurement matrix is known to satisfy the RIP. Thus, provided *m*≥*C*(∥*x*^{∗}∥_{0}+∥*d*^{0}∥_{0}) log((*n*+*m*)/(∥*x*^{∗}∥_{0}+∥*d*^{0}∥_{0})), we invoke Theorem 1.1 from [10] and replace ∥*d*^{0}∥_{0} with *m*(1−*U*+*δ**U*) as stated above to complete the proof. □

From the theorem, we see that the number of measurements required for guaranteed recovery depends on the ratio of *σ* (standard deviation of the measurements) and *R*. In practical applications, choosing a sufficiently large *R* such that the interval [−*R*,*R*] covers multiple standard deviations on both sides of origin enables successful recovery.