 Research
 Open Access
 Published:
Tensor recovery from noisy and multilevel quantized measurements
EURASIP Journal on Advances in Signal Processing volume 2020, Article number: 41 (2020)
Abstract
Higherorder tensors can represent scores in a rating system, frames in a video, and images of the same subject. In practice, the measurements are often highly quantized due to the sampling strategies or the quality of devices. Existing works on tensor recovery have focused on data losses and random noises. Only a few works consider tensor recovery from quantized measurements but are restricted to binary measurements. This paper, for the first time, addresses the problem of tensor recovery from multilevel quantized measurements by leveraging the low CANDECOMP/PARAFAC (CP) rank property. We study the recovery of both general lowrank tensors and tensors that have tensor singular value decomposition (TSVD) by solving nonconvex optimization problems. We provide the theoretical upper bounds of the recovery error, which diminish to zero when the sizes of dimensions increase to infinity. We further characterize the fundamental limit of any recovery algorithm and show that our recovery error is nearly orderwise optimal. A tensorbased alternating proximal gradient descent algorithm with a convergence guarantee and a TSVDbased projected gradient descent algorithm are proposed to solve the nonconvex problems. Our recovery methods can also handle data losses and do not necessarily need the information of the quantization rule. The methods are validated on synthetic data, image datasets, and music recommender datasets.
Introduction
Many practical datasets are highly noisy and quantized, and recovering the actual values from quantized measurements finds applications in different domains. For example, users’ preferences in rating systems are represented by a few scores (or even two scores in 1 bit [1]), which do not provide accurate characterizations of preferences. Due to sensor issues or communication restrictions, images and videos in some applications may have very low resolution [2]. Quantization is applied to enhance the data privacy in power systems and sensor networks [3–5]. It is important to develop computationally efficient and reliable methods to recover the actual data from lowresolution measurements.
Li et al. [6] estimate the data from 1bit measurements by linearizing the nonlinear quantizer. Khobahi et al. [7] leverage the deep learning tool to recover the data. These approaches either require accurate parameter estimation or have high computational costs. Some other works recover data from a small number of quantized measurements, but the methods only apply to sparse signals [8–10]. Lowrank matrices can characterize the intrinsic data correlations in user ratings, images, and videos [11, 12], and the lowrank property has been exploited to recover the data from quantized measurements by solving a nonconvex constrained maximum likelihood estimation problem [1, 4, 13, 14]. For an n×n rankr matrix (r≪n), the best achievable recovery error from quantized measurements is \(O\left (\sqrt {\frac {r^{3}}{n}}\right)\)^{Footnote 1} [4, 13]. The recovery error diminishes to zero when the data size increases.
Practical datasets may contain additional correlations that cannot be captured by lowrank matrices. For instance, if every frame of a video is vectorized so that the video is represented by a matrix, the spatial correlation is not directly characterized by lowrank matrices [15]. In recommendation systems, users’ ratings against objects vary under different contexts [16], and assuming the rating matrix is lowrank does not fully characterize the dependence of ratings over the contexts. That motivates the usage of lowrank tensors where higherorder tensors contain data arrays with at least three dimensions.
Tensors can represent threedimensional objects in generic object recognition [17], engagements on advertisements over time for behavior analysis [18], gene expressions in the development process [19], etc. Moreover, tensor techniques are widely used in deep learning [20, 21]. Unlike matrices, there are different rank definitions for higherorder tensors, such as CANDECOMP/PARAFAC (CP) rank [22, 23] and Tucker rank [24]. Since there exist correlations in the practical datasets such as images and user ratings, the resulting tensor data is often lowrank. The lowrank property has been exploited in problems like lowrank tensor completion [25–30] and lowrank tensor recovery [31–35]. Leveraging the lowrank property, a convex relaxation of the Tucker rank can be applied to robust tensor recovery [31, 32] and tensor completion [25, 26]. CP rankbased decompositions are also proved to be effective in these tasks [27, 28]. Besides the lowTuckerrank and lowCPrank, there are also other tensor rank forms used in literature. To solve the unbalanced matricization scheme in the Tucker rank, tensor train rank is proposed to solve the tensor recovery and completion problem [29, 33]. Some works also leverage another rank form called tubal multirank and its convex surrogate tensor nuclear norm as tools for tensorrelated tasks [34, 35]. This paper concerns only the recovery under the CP rank. It is more challenging to analyze tensors than matrices because some matrix properties do not extend to higherorder tensors. For example, in general, the best lowrank approximation to a tensor does not always exist [36, 37]. A couple of works have focused on special tensors that have the tensor singular value decomposition (TSVD) (a.k.a. completely orthogonal) [38, 39], which is a direct generalization of the matrix singular value decomposition. The significance of the tensors with TSVD is that many matrix properties are preserved. For example, the best orthogonal lowrank tensor approximation always exists [38, 40], and the CP rank of a tensor having TSVD equals to the number of its singular values. We will refer tensors having TSVD as SVDtensors.
Lowrank tensors with quantization noise exist in hyperspectral data [41, 42], rating systems [43], and the knowledge predicates [44]. Existing works on lowrank tensor recovery mainly consider random noise or sparse noise [45–47], while only a few works [41–43] consider tensor recovery from 1bit measurements, i.e., all measurements are binary. Aidini et al. [41] introduce a 1bit tensor completion method that first unfolds the tensor measurements to matrices along all dimensions and then applies matrix recovery techniques to each matrix. The final estimation is a realvalued tensor folded by the weighted sum of the recovered matrices. Ghadermarzy et al. [43] use tensor Mnorm constraint to replace the exact lowrank constraint and then recover the tensor by solving the convex optimization problem. The recovery error is guaranteed to be \(O((\frac {r^{3K3}K}{n^{K1}})^{1/4})\), where K is the number of tensor dimensions, and n is the size per dimension. Li et al. [42] focus on threedimensional tensor and the scenario when a significant percentage of measurements are lost. The recovery is based on minimizing the convex surrogate of the tubal multirank.
This paper for the first time studies the problem of lowrank tensor recovery from multilevel quantized measurements, while the existing works [41–43] only consider 1bit measurements. This paper is also the first one to study the recovery problem for SVDtensors for quantized measurements. We formulate the tensor recovery problems as constrained nonconvex optimization problems. When there is no missing data, and the quantization rule is known, we prove that the recovery error of a Kdimensional tensor with CP rank r is at most \(O(\sqrt {\frac {r^{K1}K\log K}{n^{K1}}})\), where n is the length of each tensor dimension. The error bound decays to zero much faster than any existing results. Moreover, we prove that if the tensor is a SVDtensor, then the recovery error is reduced to at most \(O(\sqrt {\frac {rK\log K}{n^{K1}}})\). We also prove that the recovery error of lowCPrank tensors by any algorithm cannot be smaller than the order of \(\sqrt {\frac {r}{n^{K1}}}\). Our method is close to optimal for a small r. We further develop computationally efficient algorithms to solve the nonconvex problems for recovering lowCPrank tensors and tensors with TSVD. We prove that even with partial data losses, our proposed lowCPrank tensor recovery algorithm converges to a critical point of the nonconvex problem from any initialization with at least a sublinear convergence rate. Lastly, all the existing works on tensor recovery from quantized measurements assume that the quantization rule is known to the recovery method except one lowrank matrix recovery work [13]. We empirically extend our methods to recover the tensor from quantized measurements when the quantization rule is unknown and demonstrate encouraging numerical results.
This paper is organized as follows. The problem formulation is introduced in Section 2. Section 3 discusses our approach and its recovery error. An efficient algorithm with the convergence guarantee is proposed in Section 4.1. The recovery algorithm for SVDtensors is proposed in Section 4.2. Section 5 records the numerical results. Section 6 concludes the paper. All the lemmas and proofs can be found in the Appendices (see Appendices 1, 2, 3, 4, 5, and 6).
Notation and preliminaries
We use boldface capital letters to denote matrices (twodimensional tensors), e.g., A. Higherorder tensors (three or higher dimensions) are denoted by capital calligraphic letters, e.g., \(\mathcal {X}\). \(\mathcal {X} \in \mathbb {R}^{n_{1} \times n_{2} \times \dots \times n_{K}}\) represents a Kdimensional tensor with the size of the ith dimension equaling to n_{i},i∈[K], where \([K] = \{1,2,\dots,K\}\). \(\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}\) denotes the \((i_{1},i_{2},\dots,i_{K})\)th entry of \(\mathcal {X}\). \(\mathbf {X}_{(k)} \in \mathbb {R}^{n_{k} \times (n_{1} \dots n_{k1} n_{k+1} \dots n_{K})}\) is the modek matricization of \(\mathcal {X}\), which is formed by unfolding \(\mathcal {X}\) along its kth dimension. The Frobenius norm of the tensor \(\mathcal {X}\) is defined as \(\\mathcal {X}\_{F} = \sqrt {\sum _{i_{1} = 1}^{n_{1}}\sum _{i_{2} = 1}^{n_{2}}\dots \sum _{i_{K} = 1}^{n_{K}}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{2}}\).
Let \(a_{i} \in \mathbb {R}^{n_{i}}, \forall i \in [K]\) be K vectors. Then, \(\mathcal {A} = a_{1} \circ \dots \circ a_{K}\) is a Kdimensional tensor with \(\mathcal {A}_{i_{1},i_{2},\dots,i_{K}} = {a_{1}}_{i_{1}} {a_{2}}_{i_{2}} \dots {a_{K}}_{i_{K}}\). Here, ∘ is called the outer product. The CP rank of \(\mathcal {X}\) [22, 23] is defined as
where A_{k}_{i} is the ith column of A_{k}. \(\mathbf {A_{1}} \circ \mathbf {A_{2}} \circ \dots \circ \mathbf {A_{K}}\) is equivalent to \(\sum _{i=1}^{R} \mathbf {A_{1}}_{i} \circ \mathbf {A_{2}}_{i} \circ \dots \circ \mathbf {A_{K}}_{i}\). Note that the CP rank can be different in different fields, e.g., real number and complex number. We remark that the results in this paper can be easily generalized to different fields. We use A_{k}⊙A_{p} to represent the KhatriRao product [48] of \(\mathbf {A_{k}} \in \mathbb {R}^{n_{k} \times r}, \mathbf {A_{p}} \in \mathbb {R}^{n_{p} \times r}\). We have \(\mathbf {A_{k}} \odot \mathbf {A_{p}} = [\mathbf {A_{k}}_{1} \bigotimes \mathbf {A_{p}}_{1}, \mathbf {A_{k}}_{2} \bigotimes \mathbf {A_{p}}_{2}, \dots, \mathbf {A_{k}}_{r} \bigotimes \mathbf {A_{p}}_{r}]\), where \(\\mathbf {A_{k}}_{i} \\bigotimes \ \mathbf {A_{p}}_{i} \ = \[(\mathbf {A_{k}}_{i})_{1}\mathbf {A_{p}}_{i}^{T},\(\mathbf {A_{k}}_{i})_{2}\mathbf {A_{p}}_{i}^{T},\ \dots,\(\mathbf {A_{k}}_{i})_{n_{k}}\\mathbf {A_{p}}_{i}^{T}]^{T}\ \in \\mathbb {R}^{n_{k}n_{p} \times 1},\\forall i \in [r]\).
We define the set of tensors that have tensor singular value decomposition (TSVD) as follows
where 1_{[B]} is an indicator function that takes value “1” if the event B is true and value “0” otherwise. 〈,〉 denotes the inner product operation. Definition (2) generalizes the matrix SVD. One can see that TSVD is a special case of the decomposition form in (1), and Lemma 3.3 in [38] implies that the tensor in (2) has CP rank R. We remark that not all of the tensors have TSVD. We refer readers to [38] for more details.
Throughout this paper, when discussing tensor ranks and lowrank tensors, we refer to CP rank if not otherwise specified. Again, we will refer tensors in \(\mathcal {S}_{\text {tsvd}}\) as SVDtensors.
Our proposed framework of tensor recovery from noisy and multilevel quantized measurements
Let \(\mathcal {X}^{*} \in \mathbb {R}^{n_{1} \times n_{2} \times \dots \times n_{K}}\) denote the actual data that are represented by a Kdimensional tensor. Let ∥·∥_{∞} denote the entrywise infinity norm. We assume that the maximum value of \(\mathcal {X}^{*}\) is bounded by a positive constant α, i.e., \(\\mathcal {X}^{*}\_{\infty } \le \alpha \). We further assume that \(\mathcal {X}^{*}\) is a lowrank tensor, i.e., \(\text {rank}(\mathcal {X}^{*})\le r\).
Each entry of \(\mathcal {X}^{*}\) is mapped to one of a few possible values with certain probabilities through the quantization process. To model this probabilistic mapping, let \(\mathcal {N} \in \mathbb {R}^{n_{1} \times n_{2} \times \dots \times n_{K}}\) denote a noise tensor with i.i.d. entries drawn from a known cumulative distribution function Φ(x). Given the quantization boundaries \(\omega _{0}^{*} < \omega _{1}^{*}< \dots < \omega _{W}^{*}\), the noisy data \(\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{*} + \mathcal {N}_{i_{1},i_{2},\dots,i_{K}}\) (i_{j}∈[n_{j}],j∈[K]) can be quantized to W values based on the following rule,
where Q is an operator that maps a real value to one of W values. We choose \(\omega _{0}^{*} = \infty \) and \(\omega _{W}^{*} = \infty \). \(\mathcal {Y}_{i_{1},i_{2},\dots,i_{K}}\) is the \((i_{1},i_{2},\dots,i_{K})\)th entry of the quantized measurements \(\mathcal {Y} \in [W]^{n_{1} \times n_{2} \times \dots \times n_{K}}\). When \(W=2,\mathcal {Y}\) reduces to the 1bit case [43]. In general, \(\mathcal {Y}\) is a log2Wbit tensor. Figure 1 provides a visualization of the quantization process when K=3 and \(\mathcal {Y}\) is a log2Wbit tensor. The actual tensor \(\mathcal {X}^{*}\) is mapped to the quantized tensor \(\mathcal {Y}\) by first adding a noise tensor \(\mathcal {N}\) and then quantized by the operator Q.
The probability that \(\mathcal {Y}_{i_{1},i_{2},\dots,i_{K}} = l\) given \(\\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{*}\) and \(\omega _{l1}^{*},\ \omega _{l}^{*}\) is expressed by \(f_{l}\left (\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{*}, \omega _{l1}^{*}, \omega _{l}^{*}\right)\), where
and \(\sum _{l=1}^{W}f_{l}\left (\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{*}, \omega _{l1}^{*}, \omega _{l}^{*}\right)= \Phi \left (\infty \\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{*}\\right)\\Phi \left (\infty \mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{*}\right) = 1\). The probability description (4) follows from the same formula as those in [4, 13], except that the entries are from a higherorder tensor. Two common choices for Φ(x) are as follows: (1) probit model with Φ(x)=Φ_{norm}(x/σ), where Φ_{norm} is the cumulative distribution function of a standard Gaussian distribution, and (2) logistic model with \(\Phi (x)=\Phi _{\text {log}}(x/\sigma)=\frac {1}{1+e^{x/\sigma }}\).
We also consider the general setup that there exists missing data in the measurements, i.e., only measurements with indices belonging to the observation set Ω are available, while all the other measurements are lost. The question we will address in this paper is as follows. Given the partial observations \(\mathcal {Y}_{\Omega }\) and the noise distribution Φ, how can we estimate the original tensor \(\mathcal {X}^{*}\)? We will discuss the case when \(\mathcal {X}^{*}\) is a general tensor and the special case when \(\mathcal {X}^{*}\) is a SVDtensor.
We remark that this problem formulation can be applied in different domains. In the user voting systems, data can be represented as {users×scoring objects×contexts} [16], which is a threedimensional tensor. The scores from the reviewers are highly quantized [1]. By solving the quantized tensor recovery problem, one can obtain the actual preferences of the reviewers. In video processing, the measurements can be represented as {rows of a frame×columns of a frame×different frames}. The measurements can be highly quantized due to the sensing process, and the objective is to recover the data [49, 50]. A similar idea also applies to lowquality image recovery [2, 15]. Images from the same subject can be represented by {rows of an image×columns of an image×different images}.
Results: theoretical
We propose to estimate tensor \(\mathcal {X}^{*}\), boundaries \(\omega _{1}^{*}, \omega _{2}^{*},\ \cdots, \omega _{W1}^{*}\) using a constrained maximum likelihood approach. The negative loglikelihood function is given by
where Ω denotes the cardinality of Ω. Equation (5) is a convex function when f_{l} is a logconcave function. When \(\omega _{l}^{*}\)’s are unknown, we estimate \(\mathcal {X}^{*},\omega _{l}^{*}\)’s by \(\hat {\mathcal {X}},\hat {\omega }_{l}\)’s, where
and
Most existing works on quantized data recovery consider the special case that the quantization boundaries are known [1, 4, 43] only except for [13]. In this case, (6) can be simplified to
where
If we assume that \(\mathcal {X}^{*} \in \mathcal {S}_{\text {tsvd}}\), then the optimization problem (6) changes to
where
and problem (8) changes to
where
We remark that all (6), (8), (10), and (12) are nonconvex problems since \(\mathcal {S}_{f\omega },\mathcal {S}_{f},\mathcal {S}_{f\omega s},\mathcal {S}_{fs}\) are nonconvex sets. Note that when the groundtruth tensor is not in \(\mathcal {S}_{\text {tsvd}}\), the solutions of (10) and (12) can be viewed as a lowrank approximation of the tensor.
Ghadermarzy et al. [43] study the case with known bin boundaries in (8)–(9). It focuses on the special case that W=2 and relaxes the lowrank constraint in \(\mathcal {S}_{f}\) with a convex Mnorm constraint. Bhaskar et al. [13] and Gao et al. [4] consider minimizing a negative loglikelihood function subject to a lowrank constraint, which is similar to (8), but are restricted to quantized matrix recovery. None of the works address the problem of lowrank tensor recovery from multilevel quantized measurements, nor the recovery of SVDtensors. We first analyze the recovery performance of our models. We defer the algorithms to Section 4.
Tensor recovery guarantee
Similar to the works of quantized matrix recovery [1, 13], we first define two constants γ_{α} and L_{α} for analysis in the case boundaries are all known constants. For simplicity, we denote \(f_{l}(x, \omega _{l1}^{*},\omega _{l}^{*})\) by f_{l}(x).
where \(\dot {f}_{l}\) and \(\ddot {f}_{l}\) are the first and secondorder derivatives of f_{l}. Note that \(\ddot {f}_{l}  \dot {f}_{l}f_{l} \geq 0\) if f_{l} is logconcave, and \(\ddot {f}_{l}  \dot {f}_{l}f_{l} > 0\) if f_{l} is strictly logconcave. One can check that f_{l} is strictly logconcave if Φ is logconcave, which holds true for noises following Gaussian and logistic distributions. Thus, γ_{α}>0 in our setup. We also remark that L_{α} and γ_{α} are bounded by some fixed constants when both α and f_{l} are given. Taking the logistic model as an example [4, 13], we have
where L_{α} and γ_{α} depend on σ and W. It is also easy to check that γ_{α},L_{α}>0 from (15).
We next state our main results that characterize the recovery error when there are no data losses and the quantization boundaries are known, i.e., the accuracy of the solutions to (8) and (12) when Ω is the full observation set.
Theorem 1
Suppose \(\omega _{l}^{*}\)’s are given, and Ωcontains all the indices. \(\mathcal {X}^{*} \in \mathcal {S}_{f}\), and f_{l}(x) is strictly logconcave in x, ∀l∈[W]. Then, with probability at least 1−δ,δ∈[0,1], any global minimizer \(\hat {\mathcal {X}}\) of (8) satisfies
where
Theorem 2
Under the same assumptions on \(\omega _{l}^{*}\)’s, Ω, and f_{l}(x) as Theorem 1, for \(\mathcal {X}^{*} \in \mathcal {S}_{fs}\), any global minimizer \(\hat {\mathcal {X}}\) of (12) satisfies
with probability at least 1−δ, where
Theorems 1 and 2 establish the upper bounds of the recovery error when the measurements are noisy and quantized. L_{α},δ,γ_{α} are all constants. Specifically, when \(n_{1},n_{2},\dots, n_{K}\) are all in the order of n, the recovery error of (16) and (18) can be represented as
and
The righthand sides of (20) and (21) diminish to zero when n increases to infinity. Comparing (20) and (21), the provided recovery error bound is further reduced if the tensor is a SVDtensor. Note that the Frobenius norm of X^{∗} is in the same order of \(\sqrt {n_{1}n_{2}...n_{K}}\). By dividing the actual error by \(\sqrt {n_{1}n_{2}...n_{K}}\), we have that the lefthand sides of (20) and (21) are in the same order of the relative error \(\\hat {\mathcal {X}}\mathcal {X}^{*}\_{F}/\\mathcal {X}^{*}\\), which is a commonly used normalized error measure. Therefore, the relative recovery error is sufficiently close to zero when the size of the tensor is large enough. We want to emphasize that Theorem 1 and Theorem 2 are based on the global minimizers of (8) and (12), respectively. In general, the global optimum of a nonconvex problem is hard to achieve.
Note that the recovery error depends on W implicitly because W affects L_{α} and γ_{α}. It might seem counterintuitive that the recovery error is not a monotone function of W. That is because we consider all the possible selections of bin boundaries for a given W when computing L_{α} and γ_{α}. A larger W does not necessarily lead to more information in the quantized measurements. For example, if two bin boundaries are very close to each other, almost no data would be mapped to this bin, and the effective number of quantization levels is less than W (think of the extreme case when ω_{1}=ω_{W−1}). This is why W does not appear directly in the recovery bound. Of course, in practice, in most cases, larger W (more bits) will provide us more information about the real data and thus increase the performance.
Recovery enhancement over the existing work on 1bit tensor recovery
Recovering lowCPrank tensors from 1bit measurements has been studied in the work of Ghadermarzy et al. [43]. Ghadermarzy et al. [43] relax the nonconvex lowrank constraint with a convex Mnorm constraint, and the resulting recovery method has an error bound of \(O\left (\left (\frac {r^{3K3}K}{n^{K1}}\right)^{1/4}\right)\). In contrast, our recovery error bound in (20) for general lowCPrank tensors decays to zero faster than the 1bit tensor recovery [43] for any K≥2, and the bound for TSVD tensors is even smaller. For example, the recovery error bounds in (20) and (21) are \(O(\frac {r}{n})\) and \(O\left (\frac {\sqrt {r}}{n}\right)\) when K=3, while the bound is \(O\left (\left (\frac {r^{3/2}}{n^{1/2}}\right)\right)\) by Ghadermarzy et al. [43].
Reduction to the matrix case
When reduced to the matrix case, i.e., K=2, both (20) and (21) show that the quantized matrix recovery has an error bound of \(O\left (\sqrt {\frac {r}{n}}\right)\), which is the same as the smallest error bound in the matrix case [4].
Recovery enhancement over quantized matrix recovery
Here, we simply compare the recovery error in (20) and (21) with the results obtained by applying quantized matrix recovery methods on the modek matricization X_{(k)} along the kth dimension of \(\mathcal {X}\). When the size of each dimension is Θ(n), the sizes of the two dimensions of X_{(k)} are Θ(n) and Θ(n^{K−1}), respectively. Let \(\bar {r}\) be the rank of the matrix, and \(\bar {r}\) is smaller or equal to r.
Existing works provide the theoretical analyses of matrix recovery from quantized measurements [1, 4, 13]. The recovery errors of applying these methods to X_{(k)} are in the order of \(O\left (\sqrt {\frac {\bar {r}^{3}}{n}}\right)\) and \(O((\frac {\bar {r}}{n})^{1/4})\) by Bhaskar et al. [13] and Davenport et al. [1], respectively. The best existing bound is \(O\left (\sqrt {\frac {\bar {r}}{n}}\right)\) by Gao et al. [4]. Note that the error order in our results has a power of K−1 in its denominator. For example, the recovery error is \(O(\frac {r}{n})\) by (20) and \(O(\frac {\sqrt {r}}{n})\) by (21) when K=3. r is often assumed to be a constant, i.e., O(1) as in the work of Ghadermarzy et al. [43]. Then, \(\bar {r}\) is also O(1). It is easy to see that when K≥3, the recovery errors of both (20) and (21) decay to zero faster than the best existing bound of \(O(\sqrt {\frac {\bar {r}}{n}})\) by Gao et al. [4] for the modek matricization case. In Table 1, we compare our results to the stateofart results of the existing 1bit tensor recovery [43] and quantized matrix recovery [4].
Fundamental limitation of the recovery
We next provide a fundamental error limit of any recovery method in recovering lowrank (CP rank) tensors even when the observed measurements are unquantized. We consider the noise distribution that follows from zero mean Gaussian distribution with variance σ^{2}. Let n_{max} denotes the max(n_{1},n_{2},⋯,n_{K}), and assume rn_{max}>64.
Theorem 3
Let \(\mathcal {N} \in \mathbb {R}^{n_{1} \times n_{2} \cdots \times n_{K}}\) contain i.i.d. entries from zero mean Gaussian distribution with variance σ^{2}. For any \(\mathcal {X} \in \mathcal {S}_{f}\), consider any algorithm that takes \(\mathcal {Y}=\mathcal {X}+\mathcal {N}\)as the input and returns an estimation \(\hat {\mathcal {X}}\). Then, there always exists \(\mathcal {X} \in \mathcal {S}_{f}\) such that with probability at least \(\frac {3}{4}\),
holds for a fixed constant \(C_{1} < \sqrt {\frac {1}{512}}\).
Theorem 3 establishes the lower bound of the recovery error. When \(n_{1},n_{2},\dots, n_{K}\) are all in the order of n, the recovery error of (22) can be represented as
That means the recovery error from unquantized measurements by any algorithm is at least \(\Theta \left (\sqrt {\frac {r}{n^{K1}}}\right)\). Comparing the error bounds in (20) and (23), one can see that the error bound (20) is almost orderwise optimal when r≪O(n).
Algorithms: tensor recovery from quantized measurements
We propose two efficient algorithms to solve the noncovex problems (6) and (10), respectively. Both algorithms transform the rank constraint to a penalty function in the objectives, and update all the variables alternatively. Since the problem (10) has extra orthonormal constraints on tensor decomposition components, we apply different updating strategies on these decomposition component variables.
Alternating proximal gradient descent based on tensors
We develop a fast algorithm named tensorbased alternating proximal gradient descent (TAPGD) to solve the nonconvex problem (6) with the convergence guarantee.
Since \(\text {rank}(\mathcal {X})\le r\), there exists \(\mathbf {A_{k}}\in \mathbb {R}^{n_{k} \times r}, \forall k \in [K]\), such that \(\mathcal {X} = \mathbf {A_{1}} \circ \mathbf {A_{2}} \circ \dots \circ \mathbf {A_{K}}\). Then, we change the rank constraint into a penalty function \(\frac {\lambda }{2}\\mathcal {X}  \mathbf {A_{1}} \circ \mathbf {A_{2}} \circ \dots \circ \mathbf {A_{K}}\_{F}^{2}\) in the objective, where λ is a positive constant. The equality constraint holds when λ goes to infinity. Note that \(\mathcal {X} = \mathbf {A_{1}} \circ \mathbf {A_{2}} \circ \dots \circ \mathbf {A_{K}}\) is in the form of CANDECOMP/PARAFAC (CP) decomposition [51, 52]. Unlike matrix decomposition and the other major tensor decomposition method (Tucker decomposition [53]), CP decomposition has a very weak requirement for the uniqueness of tensor factors. A sufficient condition for CP decomposition to be unique is that the summation of independent column numbers in A_{k},k=1,2,⋯,K is larger or equal to 2r+K−1 [23], which often holds true. In contrast, Tucker decomposition is generally not unique and is usually computationally expensive to update its core tensor.
We revise \(\mathcal {S}_{f\omega }\) to add constraints that quantization boundaries shall not be too close to avoid trivial solutions in practice. The resulting feasible set is
where κ_{l},∀l∈{2,3,⋯,W−1} are some positive numbers that can be chosen using hyperparameter tuning or simply set as small positive constants, and κ_{1}=κ_{W}=0. α_{low},α_{upper} are two constants that provide the lower and upper bound of the boundaries, which could be chosen as −α and α, or estimates computed in different applications. The revised problem of (6) is shown as follows
where
\(\Psi _{1}(\mathcal {X})\) is transformed by the constraint \(\\mathcal {X}\_{\infty }\le \alpha \). Ψ_{2}(ω_{l}) is transformed by the constraints on ω_{l} in \(\mathcal {S}_{\omega }\). Let
Then, we solve (25) using the proximal gradient method [54]. The main steps of the proximal gradient method include updating \(\mathcal {X}, \mathbf {A_{k}}, k \in [K], \omega _{l}, l \in [W1]\) by using the gradient descent method on H, and projecting the result to \(\mathcal {S}_{\omega }\). Since for ∀k∈[K]
the partial gradients of H with respect to A_{k} and \(\mathcal {X}\) can be calculated by
where B_{k}=A_{K}⊙...⊙A_{k+1}⊙A_{k−1}⊙...⊙A_{1}. For any (i_{1},i_{2},⋯,i_{K})∈Ω,
Otherwise, for any (i_{1},i_{2},⋯,i_{K})∉Ω
Strictly speaking, the result should time a \(\frac {n_{1}n_{2}\cdots n_{K}}{\Omega }\) term. We ignore this term since it is canceled out when we multiply the step size in our algorithm. The partial derivative of H with respect to ω_{l} is shown as follows
The step sizes of the gradient descent are selected as
where \(\(\mathbf {B_{k}})^{T}\mathbf {B_{k}}\,\frac {1}{\sigma \beta }+\lambda,\frac {\sqrt {G_{l}}+\sqrt {G_{l+1}}}{\sigma ^{2}\beta ^{2}}\) are Lipschitz constants of \(\nabla _{\mathbf {A_{k}}} H,\nabla _{\mathcal {X}} H\), and \(\nabla _{\omega _{l}}H\). G_{l},G_{l+1} are the number of entries in \(\mathcal {Y}_{\Omega }\) that equal to l and l+1, respectively. Here, β is a small positive value that satisfies \(\Phi (\omega _{l}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}) \geq \Phi (\omega _{l1}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}) + \beta \). This holds true since \(\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}, \omega _{l}, \omega _{l1}\) are all bounded, ω_{l} is larger than ω_{l−1}, and Φ is a monotonously increasing function. After updating \(\mathcal {X}\), the algorithm sets \(\mathcal {X}_{i_{1},i_{2},...,i_{K}}\) to α if \(\mathcal {X}_{i_{1},i_{2},...,i_{K}} > \alpha \), and sets \(\mathcal {X}_{i_{1},i_{2},...,i_{K}}\) to −α if \(\mathcal {X}_{i_{1},i_{2},...,i_{K}} < \alpha \). After updating ω_{l}, the algorithm sets ω_{l}= min(ω_{l+1}−κ_{l+1},α_{upper}) if ω_{l}> min(ω_{l+1}−κ_{l+1},α_{upper}), and sets ω_{l}= max(ω_{l−1}+κ_{l},α_{low}) if ω_{l}< max(ω_{l−1}+κ_{l},α_{low}).
The algorithm is initialized by first estimating \(\omega _{l}^{*}\)’s according to the applications or simply setting \(\omega _{l}^{0} = \frac {2\alpha l}{W}\alpha \) if no information is available, and then setting
\(\mathbf {A_{k}}^{0} \in \mathbb {R}^{n_{k} \times r},\forall k \in [K]\) are obtained through the decomposition of \(\mathcal {X}^{0}\). The details of TAPGD are shown in Algorithm 1. Note that when the quantization boundaries \(\omega _{l}^{*}\)’s are known, TAPGD can be revised easily by removing steps 14–20 from Algorithm 1.
To improve the recovery performance, one can multiple λ by a small constant larger than one in each iteration. This provides a better numerical result than fixing λ in all iterations. The complexity of TAPGD in each iteration is \(O(Krn_{1}n_{2}\dots n_{K})\). The convergence of TAPGD is summarized in Theorem 4.
Theorem 4
Assume that the sequence {A_{k}^{t}} generated by Algorithm 1 is bounded. Then, TAPGD globally converges to a critical point of (25) from any initial point, and the convergence rate is at least \(O\left (t^{\frac {\theta  1}{2\theta  1}}\right)\), for some \(\theta \in \left (\frac {1}{2},1\right)\).
Theorem 4 indicates a sublinear convergence of TAPGD. One way to satisfy the requirement of bounded sequence is to scale the factorized variables so that ∥A_{1}∥_{F}=∥A_{2}∥_{F}=⋯=∥A_{K}∥_{F} after each iteration. We find TAPGD performs well numerically without the additional steps.
TSVDbased alternating projected gradient descent
We develop an algorithm named TSVDbased alternating projected gradient descent (TSVDAPGD) to solve the nonconvex problem (10). Let \(H'=F_{\Omega }(\mathcal {X},\omega _{1},\omega _{2},\cdots,\omega _{W1}) + \frac {\lambda }{2}\\mathcal {X}  \sum _{i=1}^{r} \zeta _{i} \mathbf {V_{1}}_{i} \circ \mathbf {V_{2}}_{i} \circ \dots \circ \mathbf {V_{K}}_{i}\_{F}^{2}\). Similar to (25), we relax the problem of (10) to
The updates of \(\mathcal {X}, \omega _{l}\) are the same as TAPGD, while the updates of the decomposition components ζ_{i},V_{k} are different. In Algorithm 2, we borrow the idea from work of Li et al. [39] to update ζ_{i},V_{k},i∈[r],k∈[K] in steps 2–8. QR in step 4 in Algorithm 2 represents the QR decomposition [55] that returns an orthonormal matrix and an upper triangular matrix, and we use the orthonormal matrix to update V_{k}.
Similar to Algorithm 1, when the quantization boundaries are known, TSVDAPGD can be revised easily by removing step 11. We cannot prove the convergence of Algorithm 2 yet and will leave it to future work. However, numerically, Algorithm 1 demonstrates reliable numerical performance as shown in Section 5.
Results: numerical experiments
We conduct simulations on synthetic data, image data, and data from an incar music recommender system [16] in this section. The recovery performance is measured by \(\\mathcal {X}^{*}\tilde {\mathcal {X}}\_{F}^{2}/\\mathcal {X}^{*}\_{F}^{2}\), where \(\tilde {\mathcal {X}}\) is our estimation of \(\mathcal {X}^{*}\). K=3 in both tests on synthetic data and real data. We set T=200. All the results are averaged over 100 runs. The simulations are run in MATLAB on a 3.4GHz Intel Core i7 computer.
Synthetic data
A rankr, threedimensional tensor is generated as follows. We first generate \(\mathbf {A_{1}} \in \mathbb {R}^{n_{1} \times r}\) with entries sampled independently from a uniform distribution in \([0.5,0.5],\mathbf {A_{2}} \in \mathbb {R}^{n_{2} \times r}\), and \(\mathbf {A_{3}} \in \mathbb {R}^{n_{3} \times r}\) with each entry sampled independently from a uniform distribution in [0,1]. Then, we obtain the tensor by calculating A_{1}∘A_{2}∘A_{3} and scaling all the values to [−1,1]. A rankr, threedimensional SVDtensor is generated as follows. V_{1},V_{2},V_{3} are first obtained by transforming A_{1},A_{2},A_{3} to orthonormal matrices. ζ_{i},i∈[r] are generated from righthalf standard normal distribution. Then, we obtain the SVDtensor by (2) and scaling all the values to [−1,1]. The entries of \(\mathcal {N}\) are i.i.d. generated from the Gaussian distribution with mean 0 and the standard deviation σ of 0.25. We choose W=2 (1bit) and 4 (2bit) in our experiments. When \(W = 2,\omega _{0}^{*} = \infty,\omega _{1}^{*} = 0,\omega _{2}^{*} = \infty \). When \(W = 4,\omega _{0}^{*} = \infty,\omega _{1}^{*} = 0.4,\omega _{2}^{*} = 0,\omega _{3}^{*} = 0.4,\omega _{4}^{*} = \infty \).
Figure 2 compares TAPGD with Mnorm constrained 1bit tensor recovery (MNC1bitTR) method [43] and the quantized matrix recovery method [4]. We remark that MNC1bitTR can only deal with 1bit measurements and requires solving a convex optimization problem. The tolerance value is set as 0.01 for the matrix recovery method. In the MNC1bitTR method, we use the theoretical upper bound \(r^{\frac {3}{2}}\alpha \) (here α=1) to bound the maximum row norm of the lowrank factors. We vary one of the rank, the dimension, and the noise level while fixing the other parameters. n_{1}=n_{2}=n_{3}=120 when we only vary the rank and the noise level. Figure 2 demonstrates that the relative recovery error increases when the rank increases or the dimension decreases. The results also show that TAPGD has the best performance among all these methods. Moreover, the performance improves when the number of bits increases. Figure 3 shows the comparisons of different algorithms on recovering SVDtensors. The performance of TSVDAPGD outperforms the existing methods when the tensor is a SVDtensor. We notice that the recovery errors of using TAPGD and TSVDAPGD are very close.
As shown in Fig. 4, when the noise level (the standard deviation σ) increases, the relative recovery error first decreases and then increases. The reason is that the noise is considered as part of the quantization process and plays the role of adding measurement uncertainty. The problem without noise (measurement uncertainty) is illposed because all the values in one bin are mapped to one same quantized value deterministically.
Image data
We test our method on the Extend Yale Face Dataset B [56, 57]. The dataset contains 192×168 pixel face images from 38 different people. Each person has 64 images with different poses and various illumination. We pick two objects to form a 192×168×128 threedimensional tensor. All entries are scaled to [0,1]. We add \(\mathcal {N}\) with i.i.d. entries generated from the Gaussian distribution with mean 0 and the standard deviation of 0.3. When \(W = 2,\omega _{0}^{*} = \infty,\omega _{1}^{*} = 0.4,\omega _{2}^{*} = \infty \). When \(W = 3,\omega _{0}^{*} = \infty,\omega _{1}^{*} = 0.2,\omega _{2}^{*} = 0.4,\omega _{3}^{*} = \infty \). Figure 5a compares TAPGD with MNC1bitTR, the quantized matrix recovery method, and a nonconvex lowrank tensor recovery method named nonconvex regularized tensor (NORT) [30]. Note that MNC1bitTR models the quantization process like our approach, while NORT does not model quantization and treats the data as general noisy measurements. We find that our method works well in a wide range of r, and the results are under the selection of r=50. The tolerance rate is set as 0.001 for the matrix recovery method. In the NORT method, we set the hyperparameters as λ=0.1,θ=5 (the parameters have different meanings from the parameters in our work), and the tolerance rate as 0.0001. In the MNC1bitTR method, we use \(r^{\frac {3}{2}}\alpha \) (here α=1) to bound the maximum row norm of the lowrank factors. It shows that the relative recovery error decreases when the percentage of the observation increases, and TAPGD obtains the best performance among all the methods. Figure 5b compares the recovery error when the bin boundaries are known and unknown to the recovery algorithm. When the boundaries are unknown, the initial point is uniformly chosen from [0.1,0.6] for ω_{1} when W=2, and [0.1,0.3],[0.2,0.6] for ω_{1},ω_{2}, respectively when W=3. α_{upper},α_{low} are selected as 0.6,0.1. κ_{l} is set to 0.1 for ∀l∈[W−1].
In Fig. 6, we show a boxplot diagram of relative recovery error with 100 runs obtained by TAPGD. All the setups are the same as the scenario W=3 in Fig. 5a. The tops and bottoms of each “box” are the 25th and 75th percentiles of the samples, respectively. The maximum standard deviation happens when the observation rate is 0.3, which equals to 8.79×10^{−4}. The relative standard deviation, which is defined as the ratio of the standard deviation to the mean, reaches its maximum value 0.028 when the observation rate is 0.6.
Figure 7 compares the time cost of TAPGD and MNC1bitTR [43] when the number of facial images changes. TAPGD is three magnitudes faster than MNC1bitTR. Figure 8 visualizes the quantized and recovered images by TAPGD.
Incar music recommender dataset
Many recommender systems’ ratings from users are highly quantized (such as like or dislike) with many missing entries (e.g., users do not give rating for all subjects), while the underlying systems may want to recover realvalued user ratings. Following the same motivations and assumptions as in quantized matrix [1, 13] and 1bit tensor work [43], the quantized measurements are caused by system limitation, and the actual ratings of users lie in the realvalued domain [1, 13, 43]. Moreover, users’ actual ratings are affected by a few factors and thus satisfy the lowrank property [58]. Our method can be used to recover the true underlying realvalued user preferences, thus improving the quality of recommendations. We apply our method to an incar music recommender dataset [16]. The recommender dataset contains 139 songs with 4012 ratings from 42 users. This dataset has 26 contexts that include relaxed driving, country side, happy, and sleepy. The same user may rate different scores to the same song under different contexts. A total of 2751 ratings have the corresponding context information while the rest 1261 ratings do not have context information. We only use the ratings with context information. An example of three ratings is shown in Table 2.
We construct the resulting tensor \(\mathcal {M}\) as {users×musics×contexts}, which is a 42×139×26 tensor. The ratings are quantized to 0,1,2,3,4,5 (we change ratings to 1,2,3,4,5,6 to distinguish them from missing values). All the locations without ratings are set to be zero. We then randomly set 0.362% of the data (20% of the observed data) to be zero and let Ω_{predict} denote the set of the indices. We predict data with indices belonging to Ω_{predict} using the rest 1.448% of the data (80% of the observed data), which are referred to as training data. In this test, we define the relative recovery error as
where \(\tilde {\mathcal {M}}\) is our estimation of the ground truth, and \(\bar {\mathcal {M}}\) maps the values in \(\tilde {\mathcal {M}}\) to their nearest quantized values. The reason for the occurrence of 5 at denominator is that the maximum difference between \(\bar {\mathcal {M}}\) and \(\mathcal {M}\) is 5. The error increases when the difference increases. Ref. [43] also studies on the same dataset and first maps the multilevel quantized values to binary values. It then deletes some binary values and evaluates the recovery error. The smallest recovery error is 0.23 by their method. We remark that the multilevel prediction is harder than binary prediction in this application, since the binary case is to choose one out of two numbers, while the multilevel case is to choose one out of W>2 numbers. Here, we estimate the rank r and the noise level σ, since we do not know the actual rank and noise. In Algorithm 1, we choose the estimated rank r from the set {5,10,15,20,25}, and choose the estimated standard variation σ from the set {0.001,0.01,0.05,0.1,0.15,0.2,0.25}. The recovery results are shown in Fig. 9. The relative recovery error reaches its smallest value when r=5 and σ=0.05, and the smallest relative recovery error is 0.22. Figure 10 shows the comparison of TAPGD to NORT [30] when the percentage of the training data changes. Note that when the percentage equals to one, we use 80% of the observed data. The relative recovery error obtained by NORT is defined in the same way as using TAPGD. We set r=5 and σ=0.1 for TAPGD. In the NORT method, we set the hyperparameters as λ=0.1,θ=5, and the tolerance rate as 0.0001. The relative recovery error using NORT is about twice larger than the relative recovery error using TAPGD, suggesting that the performance improvement can be achieved when the users’ actual ratings are considered to be realvalued.
Conclusion and discussion
This paper recovers a lowrank tensor from quantized measurements. A constrained maximum loglikelihood problem is proposed to estimate the groundtruth tensor. The recovery error is proved to be at most \(O(\sqrt {\frac {r^{K1}K\log (K)}{n^{K1}}})\) when boundaries are known. The recovery error decreases to \(O(\sqrt {\frac {rK\log (K)}{n^{K1}}})\) when the tensor is a SVDtensor. When reduced to the special case of 1bit tensor recovery and lowrank matrix recovery from quantized measurements, our error bounds are significantly smaller than those of the existing methods. We also provide the fundamental limit of the recovery error by any recovery method and show that our method is nearly orderwise optimal. We propose two algorithms TAPGD and TSVDAPGD to solve the nonconvex optimization problems. We prove that TAPGD can converge to a critical point from any initial point. Both algorithms can handle missing data and do not require information of the quantization rule. Future works include data recovery when partial measurements contain significant errors and developing algorithms with global optimality guarantees.
Appendix 1. Supporting lemmas used in the proof of Theorems 1 and 2
Let \(\langle \mathcal {A}, \mathcal {B} \rangle \) denote the inner product of \(\mathcal {A} \in \mathbb {R}^{n_{1} \times... \times n_{K}}\) and \(\mathcal {B} \in \mathbb {R}^{n_{1} \times... \times n_{K}}\), i.e., the sum of the products of their entries. Then, the spectral norm of a tensor \(\mathcal {X} \in \mathbb {R}^{n_{1} \times... \times n_{K}}\) is defined as
where \(u_{1} \circ u_{2}... \circ u_{K} \in \mathbb {R}^{n_{1} \times... \times n_{K}}\).
Lemma 1 provides an upper bound on the spectral norm of a tensor with independent random entries.
Lemma 1
Suppose that \(\mathcal {X} \in \mathbb {R}^{n_{1} \times... \times n_{K}}\) is a Kdimensional tensor whose entries are independent random variables that satisfy, for some s^{2},
Then
for some δ∈[0,1], where
Proof
The proof is completed by combining Lemma 1 and Theorem 1 in [59]. □
We first define \(F(\mathcal {X})\) as the function when \(F_{\Omega }(\mathcal {X},\omega _{1},\\omega _{2},\cdots,\omega _{W1})\) is under the full observation and ω_{l},∀l∈[W−1] are known. Specifically,
Lemma 2
With probability at least 1−δ,
Proof
Consider
Recall that the probability \(\mathcal {Y}_{i_{1},i_{2},\dots,i_{K}} = l\) given \(\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{*}\) is expressed by \(f_{l}(\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{*})\), which only holds true for X^{∗}. Then using the fact that \(\sum _{l=1}^{W} f_{l}\(\\mathcal {X}_{i_{1},i_{2},...,i_{K}})=1\), we have \(\mathbb {E}[\mathcal {Z}_{i_{1},i_{2},...,i_{K}}]=0,L_{\alpha } \le \mathcal {Z}_{i_{1},i_{2},...,i_{K}}\le L_{\alpha }\). By Hoeffding’s lemma, we can obtain \(\mathbb {E}[e^{\epsilon Z^{2}_{i_{1},i_{2},...,i_{K}}}]\le e^{(L_{\alpha } + L_{\alpha })^{2} \epsilon ^{2}/8} = e^{L_{\alpha }^{2} \epsilon ^{2}/2}\). Replacing s with L_{α} in Lemma 1, we complete the proof. □
Lemma 3 and Lemma 4 describe the relation of \(\mathcal {X}^{*}\) with any data in the feasible set \(\mathcal {S}_{f}\) and \(\mathcal {S}_{fs}\). Considering any \(\mathcal {X}' \in \mathcal {S}_{f}\) and \(\mathcal {X}' \in \mathcal {S}_{fs}\), we can calculate the secondorder Taylor expansion of \(F(\mathcal {X}')\) at \(\mathcal {X}^{*}\). Both lemmas indicate that the absolute value of the firstorder term of the Taylor expansion can always be upper bounded by a term related to \(\\mathcal {X}'\mathcal {X}^{*}\_{F}\).
Lemma 3
Let \(\theta '=\text {vec}(\mathcal {X}'),\theta ^{*}=\text {vec}(\mathcal {X}^{*}),\\nabla _{\theta }F(\theta ^{*})\=\text {vec}(\nabla _{\mathcal {X}}F(\mathcal {X}^{*}))\), and \(\mathcal {X}',\mathcal {X}^{*} \in \mathcal {S}_{f}\). Then with probability at least 1−δ,
Proof
The tensor nuclear norm \(\\mathcal {X}\_{*}\) is defined as
According to Theorem 9.4 of [60], \(\\mathcal {X}\_{*}\) satisfies \(\\mathcal {X}\_{*} \le \\sqrt {\frac {r_{1}r_{2}r_{3}}{\max (r_{1}, r_{2}, r_{3})}}\\mathcal {X}\_{F}\) when K=3, where r_{k},k∈[K] is the krank of the tensor \(\mathcal {X}\), which is defined as the column rank of X_{(k)}. A generalization to any K is shown as follows
The details can be viewed in [61]. Note that r_{k}≤r,∀k∈[K], since \(\mathbf {X}_{(k)} = \mathbf {A_{k}}(\mathbf {A_{K}}\odot \dots \odot \mathbf {A_{k+1}}\odot \mathbf {A_{k1}}\odot \dots \mathbf {A_{1}})^{T}\). Therefore,
where the last inequality holds because \(\\cdot \_{*} \le \sqrt {r}\\cdot \_{F}\) for any matrix. We then have
holds with probability at least 1−δ. Then,
holds with probability at least 1−δ. The second inequality comes from the fact \(\left \langle \mathcal {A}, \mathcal {B} \right \rangle  \le \\mathcal {A}\\\mathcal {B}\_{*}\) for two tensors \(\mathcal {A}\) and \(\mathcal {B}\) [60]. We then have the desired result. □
Lemma 4
Let \(\theta '=\text {vec}(\mathcal {X}'),\theta ^{*}\=\text {vec}(\mathcal {X}^{*}),\\nabla _{\theta }F(\theta ^{*})=\text {vec}(\nabla _{\mathcal {X}}F(\mathcal {X}^{*}))\), and \(\mathcal {X}',\mathcal {X}^{*} \in \mathcal {S}_{fs}\). Then with probability at least 1−δ,
Proof
Let \(\mathcal {T}_{i}\) denote the \(\mathbf {V_{1}}_{i} \circ \mathbf {V_{2}}_{i} \circ \dots \circ \mathbf {V_{K}}_{i}\) in (2). One can easily check that \(\langle \mathcal {T}_{i}, \mathcal {T}_{j} \rangle = 0\) and \(\langle \mathcal {X}, \mathcal {T}_{i} \rangle = \zeta _{i}, i, j \in [R], i\neq j\). Then \(\\mathcal {X}\_{F} = \sqrt {\sum _{i=1}^{r} \zeta _{i}^{2}}\). Equation (45) defines that \(\\mathcal {X}\_{*} = \sum _{i=1}^{r} \zeta _{i}\). From Cauchy–Schwarz inequality, we have \(\\mathcal {X}\_{*} \le \sqrt {r}\\mathcal {X}\_{F}\). Therefore,
Following the same proof technique of (48), we have the desired result. □
Lemma 5 provides a lower bound on the secondorder term of the secondorder Taylor expansion. This lower bound is also related to \(\\mathcal {X}'\mathcal {X}^{*}\_{F}\).
Lemma 5
Let \(\theta '=\text {vec}(\mathcal {X}'),\theta ^{*}=\text {vec}(\mathcal {X}^{*})\), and \(\mathcal {X}',\mathcal {X}^{*} \in \mathcal {S}_{f}\). Then for any \(\tilde {\theta }=\theta ^{*}+\eta (\theta '\theta ^{*})\) and any η∈[0,1], we have
Proof
Lemma 5 is an extension of Lemma 7 in [13].
Using (42), it follows that
Then, we have
where the first inequality comes from the fact that \(\gamma _{\alpha } = \min _{l\in [W]}\inf _{x\le 2\alpha }\left \{\frac {\dot {f}_{l}^{2}(x)}{f_{l}^{2}(x)}\frac {\ddot {f}_{l}(x)}{f_{l}(x)}\right \}\). □
Appendix 2. Proofs of Theorems 1 and 2
Proof
The first bound 2α follows from the fact that \(\\hat {\mathcal {X}}\_{\infty },\\mathcal {X}^{*}\_{\infty } \le \alpha \). We have
Let \(\hat {\theta }=\text {vec}(\hat {\mathcal {X}})\) and \(\mathcal {F}(\hat {\theta })=F(\hat {\mathcal {X}})\). By the secondorder Taylor’s theorem, we have
where \(\tilde {\theta }=\theta ^{*}+\eta (\hat {\theta }\theta ^{*})\) for some η∈[0,1], with the corresponding tensor \(\tilde {\mathcal {X}}=\mathcal {X}^{*}+\eta (\hat {\mathcal {X}}\mathcal {X}^{*})\).
Using the results of Lemma 3 and Lemma 5, we can obtain that
holds with probability at least 1−δ for \(\hat {\mathcal {X}},\mathcal {X}^{*} \in \mathcal {S}_{f}\). Note that \(\hat {\mathcal {X}}\) is the global optimal of the optimization problem. Thus, \(F(\hat {\mathcal {X}}) \le F(\mathcal {X}^{*})\). We then have
holds with probability at least 1−δ. Thus,
holds with the same probability 1−δ, where
Similarly, using the results of Lemma 4 and Lemma 5, we can obtain that
holds with probability at least 1−δ for \(\hat {\mathcal {X}},\mathcal {X}^{*} \in \mathcal {S}_{fs}\). Following the same process as (55)–(57), we can obtain
and
Combining (52) and (56), (52) and (59), we have the results of Theorem 1 and Theorem 2, respectively. □
Appendix 3. Supporting lemmas used in the proof of Theorem 3
Lemma 6
Let ς≤1. There is a set \(\mathcal {S}_{X}\subset \mathcal {S}_{f}\) with
with the following properties:
1. For all \(\mathcal {X}\in \mathcal {S}_{X},\mathcal {X}_{i_{1},i_{2}, \dots,i_{K}}=\alpha \varsigma,\forall i_{1},i_{2}, \dots,i_{K}\)
2. For all \(\mathcal {X}^{(i)},\mathcal {X}^{(j)}\in \mathcal {S}_{X}\), i≠j,
3. For any \(\mathcal {X} \in \mathcal {S}_{X}\) and \(\mathcal {Y} = \mathcal {X} + \mathcal {N}\), we can bound the mutual information with the following inequality
Proof
Without loss of generality, we assume n_{1}=n_{max}. We first construct a matrix \(\mathbf {D} \in \mathbb {R}^{n_{1} \times n_{2}}\) with rank r in the following way. The entries in D_{i,j},∀i∈[n_{1}],j∈[r] are i.i.d. symmetric random variables with values ±ας. We then construct the rest parts of D as follows.
The matrix D will consist of same blocks of dimensions n_{1}×r. Note that D can be decomposed into \(\sum _{i=1}^{r}\zeta _{i} \mathbf {V_{1}}_{i}\circ \mathbf {V_{2}}_{i}\). We then construct a tensor by \(\mathbf {D} \circ I_{n_{3}} \circ I_{n_{4}} \cdots \circ I_{n_{K}}=\sum _{i=1}^{r}\zeta _{i} \mathbf {V_{1}}_{i}\circ \mathbf {V_{2}}_{i} \circ I_{n_{3}} \circ I_{n_{4}} \cdots \circ I_{n_{K}}\), where \(I_{n_{k}} \in \mathbb {R}^{n_{k}}\) is the vector containing all entries 1. Therefore, the CP rank of this tensor is smaller or equal to r. One can easily check that the matrix D is copied along dimension 3 to K. By varying D, we can obtain a set of lowrank tensors \(\mathcal {S}_{X}\). For any \(\mathcal {X}^{(i)}, \mathcal {X}^{(j)}\in \mathcal {S}_{X}\), we have
where δ_{i}’s are independent variables chosen from {0,1} and with mean 1/2. We then have
Equation (66) comes from Hoeffding’s inequality and the union bound. Note that the righthand side of (66) is less than 1 for \(\mathcal {X}\) of the size given in (61). Thus, the event that
for all \(\mathcal {X}^{(i)} \neq \mathcal {X}^{(j)} \in \mathcal {S}_{X}\) has nonzero probability, where the second inequality comes from the fact that ⌊x⌋≥x/2 for all x≥1.
The third property comes from modification on Lemma A.5 in [1]. By replacing the matrix dimension with the tensor dimension, we can obtain the desired result. □
Appendix 4. Proof of Theorem 3
Proof
We first define ε as follows
where C_{1} is a constant to be determined later. We consider ς in the range
We will consider running any algorithms on the set \(\mathcal {S}_{X}\). Suppose for the sake of a contradiction that there exists an algorithm, for any \(\mathcal {X} \in \mathcal {S}_{f}\), given Y, returns an \(\hat {\mathcal {X}}\) such that
with probability at least 1/4. We will show that if \(\mathcal {X}^{*} = \arg \min _{\mathcal {X}'\in \mathcal {S}_{X}}\\mathcal {X}'\hat {\mathcal {X}}\^{2}_{F}\), then \(\mathcal {X}^{*}=\mathcal {X}\). Based on (62) and (69), for any \(\mathcal {X}' \in \mathcal {S}_{X}\) with \(\mathcal {X}' \neq \mathcal {X}\), we have
Combining (70) and (71), we then have
Since \(\mathcal {X} \in \mathcal {S}_{X}\) is also a candidate for \(\mathcal {X}^{*}\), we have
Thus, if (70) holds, then \(\\mathcal {X}^{*}\hat {\mathcal {X}}\_{F} < \\mathcal {X}'\hat {\mathcal {X}}\_{F}\) for any \(\mathcal {X}' \in \mathcal {S}_{X}\) with \(\mathcal {X}' \neq \mathcal {X}\), and hence, we must have \(\mathcal {X}^{*} = \mathcal {X}\). By assumption, (70) holds with probability at least 1/4, and thus \(P(\mathcal {X} \neq \mathcal {X}^{*}) \le 3/4\). However, by Fano’s inequality, the probability that \(\mathcal {X} \neq \mathcal {X}^{*}\) is at least
Combining \(\mathcal {S}_{X}\) and \(I(\mathcal {X},\mathcal {Y})\) from Lemma 6, and using the inequality log(1+z)≤z, we obtain
Combining (75) with (69), we obtain
which implies that
Setting \(C_{1}^{2} < \frac {1}{512}\) will lead to a contradiction; hence, (70) must fail to hold with probability at least 3/4. This finishes the proof. □
Appendix 5. TAPGD: proof of the Lipschitz differential property and calculation of Lipschitz constants
We provide the Lipschitz differential property of H and compute the corresponding Lipschitz constants of its partial derivatives with respect to \(\mathbf {A_{k}}\in \mathbb {R}^{n_{k} \times r}, \forall k \in [K],\mathcal {X} \in \mathbb {R}^{n_{1} \times n_{2} \times \dots \times n_{K}}\), and ω_{l},∀l∈[W−1]. We call a function Lipschitz differentiable if and only if all its partial derivatives are Lipschitz continuous. The definition of Lipschitz continuous of a function’s partial derivatives is shown in Definition 1.
Definition 1
[54] For any variable y, and a function y→Υ(y,z_{1},z_{2},...,z_{n}), with other variables z_{1},z_{2},..,z_{n} fixed, the partial derivative ∇_{y}Υ(y,z_{1},z_{2},⋯,z_{n}) is said to be Lipschitz continuous with Lipschitz constant L_{p}(z_{1},z_{2},...,z_{n}), if the following relation holds
Let \(L^{t+1}_{\mathbf {A_{k}}},\forall k \in [K],L^{t+1}_{\mathcal {X}}\), and \(L^{t+1}_{\omega _{l}}, \forall l \in [W1]\) denote the smallest Lipschitz constants of \(\nabla _{\mathbf {A_{k}}} H,\nabla _{\mathcal {X}} H\), and \(\nabla _{\omega _{l}} H\) in the (t+1)th iteration. The details of the calculation are shown in (78), (81), and (82).
where \(\nabla _{\mathbf {A_{k}}} H(\mathbf {A_{k}})\) and \(\nabla _{\mathbf {A_{k}}} H(\mathbf {A_{k}}')\) are the abbreviations of \(\nabla _{\mathbf {A_{k}}} H(\mathbf {A_{1}}^{t+1},\ \mathbf {A_{2}}^{t+1},\ \cdots,\ \mathbf {A_{k1}}^{t+1},\ \mathbf {A_{k}},\ \mathbf {A_{k+1}}^{t},\ \cdots,\ \mathbf {A_{K}}^{t},\ \mathcal {X}^{t},\ \omega _{1}^{t},\ \omega _{2}^{t},\ \cdots,\ \omega _{W1}^{t})\) and \(\nabla _{\mathbf {A_{k}}} \H\(\\mathbf {A_{1}}^{t+1},\ \mathbf {A_{2}}^{t+1},\ \cdots,\ \mathbf {A_{k1}}^{t+1},\ \mathbf {A_{k}}',\ \mathbf {A_{k+1}}^{t},\ \cdots,\ \mathbf {A_{K}}^{t},\ \mathcal {X}^{t}\,\ \omega _{1}^{t},\ \omega _{2}^{t},\ \cdots,\ \omega _{W1}^{t})\), respectively. B_{k}^{t} represents A_{K}^{t}⊙...⊙A_{k+1}^{t}⊙A_{k−1}^{t+1}⊙...⊙A_{1}^{t+1}. (a) holds from the inequality ∥AB∥_{F}≤∥A∥∥B∥_{F}. (b) follows from
and (78) implies that
where \(\nabla _{\mathcal {X}} H(\mathcal {X})\) and \(\nabla _{\mathcal {X}} H(\mathcal {X}')\) are the abbreviations of \(\nabla _{\mathcal {X}} H(\mathbf {A_{1}}^{t+1},\ \mathbf {A_{2}}^{t+1},\ \cdots,\ \mathbf {A_{K}}^{t+1},\ \mathcal {X},\ \omega _{1}^{t},\ \omega _{2}^{t},\ \cdots,\ \omega _{W1}^{t})\) and \(\nabla _{\mathcal {X}} H(\mathbf {A_{1}}^{t+1},\ \mathbf {A_{2}}^{t+1},\ \cdots,\ \mathbf {A_{K}}^{t+1},\ \mathcal {X}',\ \omega _{1}^{t},\ \omega _{2}^{t},\ \cdots,\ \omega _{W1}^{t})\), respectively. In (81), (c) comes from the triangle inequality. (d) follows from the differential mean value theorem, and the fact ∥A∥_{F}=∥vec(A)∥_{2}. \(\nabla ^{2} F_{\Omega }(\bar {\mathcal {X}}) \in \mathbb {R}^{n_{1} \times n_{2} \times \dots \times n_{K}}\) has the \((i_{1},i_{2},\ \dots,i_{K})\)th entry equaling to \({\frac {\partial ^{2} F_{\Omega }}{\partial ^{2} \mathcal {X}_{i_{1},i_{2}, \dots,i_{K}}}}_{\bar {\mathcal {X}}_{i_{1},i_{2}, \dots,i_{K}}}\), and \(\text {diag}\(\\nabla ^{2} F_{\Omega }(\bar {\mathcal {X}})) \in \mathbb {R}^{n_{1} n_{2} \dots n_{K}\times n_{1} n_{2} \dots n_{K}}\) is a diagonal matrix with the diagonal vector equaling to \(\text {vec}(\nabla ^{2} F_{\Omega }(\bar {\mathcal {X}}))\). (e) follows from the fact that the l_{2} norm of a diagonal matrix is equal to its entrywise infinity norm. Note that the probability distribution function of the normal distribution and its derivative have the upper bounds \(\frac {1}{\sqrt {2\pi } \sigma }\) and \(\frac {e^{1/2}}{\sqrt {2\pi } \sigma ^{2}}\), respectively. Then, one can check that \(\\text {diag}(\nabla ^{2} F_{\Omega }(\bar {\mathcal {X}}))\_{\infty }\) is bounded by \(\frac {1}{\sigma ^{2}\beta ^{2}}\). (f) follows from upper bounding \(\\text {diag}(\nabla ^{2} F_{\Omega }(\bar {\mathcal {X}}))\_{\infty }\) with \(\frac {1}{\sigma ^{2}\beta ^{2}}\). (g) comes from \(\tau _{\mathcal {X}} = \frac {1}{\frac {1}{\sigma ^{2}\beta ^{2}}+\lambda }\). Therefore, \(\tau _{\mathcal {X}} \le 1/L^{t+1}_{\mathcal {X}}\).
where \(\nabla _{\omega _{l}} H(\omega _{l})\) and v\(\nabla _{\omega _{l}} H(\omega _{l}')\) are the abbreviations of \(\nabla _{\omega _{l}} H(\mathbf {A_{1}}^{t+1},\ \mathbf {A_{2}}^{t+1},\ \cdots,\ \mathbf {A_{K}}^{t+1},\ \mathcal {X}^{t+1},\ \omega _{1}^{t+1},\ \omega _{2}^{t+1},\ \cdots,\ \omega _{l1}^{t+1},\ \omega _{l},\ \omega _{l+1}^{t},\ \cdots,\ \omega _{W1}^{t})\) and \(\nabla _{\omega _{l}} H(\mathbf {A_{1}}^{t+1},\ \mathbf {A_{2}}^{t+1},\ \cdots,\ \mathbf {A_{K}}^{t+1},\ \mathcal {X}^{t+1},\ \omega _{1}^{t+1},\ \omega _{2}^{t+1},\ \cdots,\ \omega _{l1}^{t+1},\ \omega _{l}',\ \omega _{l+1}^{t}\,\ \cdots,\ \omega _{W1}^{t})\), respectively. In (82), \(\mathcal {G}_{l},\mathcal {G}_{l+1}\) are binary tensors with entries equaling to one when the corresponding positions of \(\mathcal {Y}\) equal to l and l+1, respectively, and with entries equaling to zero otherwise. (h) follows from the differential mean value theorem, and \(\mathcal {U}_{\omega _{l}},\mathcal {V}_{\omega _{l}} \in \mathbb {R}^{n_{1} \times n_{2} \times \dots \times n_{K}}\) have the entries between ω_{l} and ωl′ to satisfy the differential mean value theorem. The \((i_{1},i_{2}, \dots,i_{K})\)th entries of \(\nabla J(\mathcal {U}_{\omega _{l}}),\nabla M(\mathcal {V}_{\omega _{l}}) \in \mathbb {R}^{n_{1} \times n_{2} \times \dots \times n_{K}}\) are partial derivatives of \(\frac {\dot {\Phi }(\omega _{l}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{t+1})}{\Phi (\omega _{l+1}^{t}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{t+1})\Phi (\omega _{l}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{t+1})}\), and \(\frac {\dot {\Phi }(\omega _{l}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{t+1})}{\Phi (\omega _{l}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{t+1})\Phi (\omega _{l1}^{t+1}\mathcal {X}_{i_{1},i_{2},\dots,i_{K}}^{t+1})}\) with respect to ω_{l} at the points \((\mathcal {U}_{\omega _{l}})_{i_{1},i_{2}, \dots,i_{K}}\) and \((\mathcal {V}_{\omega _{l}})_{i_{1},i_{2}, \dots,i_{K}}\), respectively. (j) comes from the fact that \(\\nabla J(\mathcal {U}_{\omega _{l}})\_{\infty }\) and \(\\nabla M(\mathcal {V}_{\omega _{l}})\_{\infty }\) are upper bounded by \(\frac {1}{\sigma ^{2}\beta ^{2}}\). (k) comes from \(\tau _{\omega _{l}} = \frac {\sigma ^{2}\beta ^{2}}{\sqrt {G_{l}}+\sqrt {G_{l+1}}},\forall l \in [W1]\). Thus, \(\tau _{\omega _{l}} \le 1/L^{t+1}_{\omega _{l}}\).
We remark that the results of (78) and (81) do not change when the boundaries \(\omega _{l}^{*}, \forall l \in [W1]\) are known to TAPGD, since \(\omega _{l}^{t=1}, \forall l \in [W1]\) are fixed values in (78) and (81).
Appendix 6. Proof of Theorem 4
Proof
As described in Section 4.1 of the paper, \(\Psi _{1}(\mathcal {X})\) corresponds to the operations of setting \(\mathcal {X}_{i_{1},i_{2},...,i_{K}}\) to α if \(\mathcal {X}_{i_{1},i_{2},...,i_{K}} > \alpha \), and setting \(\mathcal {X}_{i_{1},i_{2},...,i_{K}}\) to −α if \(\mathcal {X}_{i_{1},i_{2},...,i_{K}} < \alpha,\forall i_{k} \in [n_{k}], k \in [K]\). ψ_{2}(ω_{l}) corresponds to the operations of setting ω_{l}= min(ω_{l+1}−κ_{l+1},α_{upper}) if ω_{l}> min(ω_{l+1}−κ_{l+1},α_{upper}), and setting ω_{l}= max(ω_{l−1}+κ_{l},α_{low}) if ω_{l}< max(ω_{l−1}+κ_{l},α_{low}),∀l∈[W−1]. TAPGD is a special case of the proximal alternating linearized minimization (PALM) algorithm from the results by Bolte et al. [54]. The global convergence of TAPGD to a critical point of (12) from any initial point can be proved by two steps: (1) \(H(\mathbf {A_{1}}, \mathbf {A_{2}}, \cdots, \mathbf {A_{K}}, \mathcal {X}, \omega _{1}, \omega _{2},\cdots, \omega _{W1})\) is Lipschitz differentiable; (2) \(H(\mathbf {A_{1}}, \mathbf {A_{2}}, \cdots, \mathbf {A_{K}}, \mathcal {X}, \omega _{1}, \omega _{2},\\cdots, \\omega _{W1}) + \Psi _{1}(\mathcal {X}) + \sum _{l=1}^{W1}\Psi _{2}(\omega _{l})\) satisfies the KurdykaLojasiewicz (KL) property.
The Lipschitz differential property of \(H(\mathbf {A_{1}}, \mathbf {A_{2}}, \cdots \, \mathbf {A_{K}}, \mathcal {X}, \omega _{1}, \omega _{2},\cdots, \omega _{W1})\) has been shown in Appendix 5. Ψ_{1} and Ψ_{2} are semialgebraic functions. According to [54], a semialgebraic function satisfies the KL property. In addition, function \(H(\mathbf {A_{1}}, \mathbf {A_{2}}, \cdots, \mathbf {A_{K}}, \mathcal {X}, \omega _{1}, \\omega _{2},\cdots, \omega _{W1})\) is differentiable everywhere, which is equivalent to being real analytic. Thus, \(H(\mathbf {A_{1}}, \mathbf {A_{2}}, \cdots, \\mathbf {A_{K}}, \mathcal {X}, \omega _{1}, \\omega _{2},\cdots, \omega _{W1})\) is a KL function according to Xu et al. [62]. Finally, we have \(H(\mathbf {A_{1}}, \mathbf {A_{2}}, \cdots, \mathbf {A_{K}},\ \mathcal {X}, \omega _{1}, \omega _{2},\\cdots, \omega _{W1}) + \Psi _{1}(\mathcal {X}) + \sum _{l=1}^{W1}\Psi _{2}(\omega _{l})\) satisfying the KL property. The claim follows by Xu et al. [62]. By Remark 3.4 in the work of Bolte et al. [54], the convergence rate is at least \(O(t^{\frac {\theta  1}{2\theta  1}})\), for some \(\theta \in (\frac {1}{2},1)\). The proof is done. □
Availability of data and materials
The datasets analyzed during the current study are available in http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.htmland https://github.com/UCMGAIA/ContextDataset/wiki/Datasets.
Notes
We use the notations g=O(n),g=Θ(n) if as n goes to infinity, g≤c·n,c_{1}·n≤g≤c_{2}·n eventually holds for some positive constants c, c_{1} and c_{2}, respectively.
Abbreviations
 CP:

CANDECOMP/PARAFAC
 TSVD:

Tensor singular value decomposition
 TAPGD:

Tensorbased alternating proximal gradient descent
 TSVDAPGD:

TSVDbased alternating projected gradient descent
 MNC1bitTR:

Mnorm constrained 1bit tensor recovery
 NORT:

Nonconvex regularized tensor
References
M. A. Davenport, Y. Plan, E. van den Berg, M. Wootters, 1bit matrix completion. Inf. Infer.3(3), 189–223 (2014).
R. Wang, M. Wang, J. Xiong, Data recovery and subspace clustering from quantized and corrupted measurements. IEEE J. Sel. Top. Signal Process.12(6), 1547–1560 (2018).
J. Choi, J. Mo, R. W. Heath, Near maximumlikelihood detector and channel estimator for uplink multiuser massive mimo systems with onebit adcs. IEEE Trans. Commun.64(5), 2005–2018 (2016).
P. Gao, R. Wang, M. Wang, J. H. Chow, Lowrank matrix recovery from noisy, quantized and erroneous measurements. IEEE Trans. Signal Process.66(11), 2918–2932 (2018).
A. Reinhardt, F. Englert, D. Christin, in Proc. Sustainable Internet and ICT for Sustainability (SustainIT). Enhancing user privacy by preprocessing distributed smart meter data, (2013), pp. 1–7. https://doi.org/10.1109/SustainIT.2013.6685194.
Y. Li, C. Tao, G. SecoGranados, A. Mezghani, A. L. Swindlehurst, L. Liu, Channel estimation and performance analysis of onebit massive mimo systems. IEEE Trans. Signal Process.65(15), 4075–4089 (2017).
S. Khobahi, N. Naimipour, M. Soltanalian, Y. C. Eldar, in ICASSP 20192019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Deep signal recovery with onebit quantization (IEEE, 2019), pp. 2987–2991. https://doi.org/10.1109/ICASSP.2019.8683876.
Y. Plan, R. Vershynin, Robust 1bit compressed sensing and sparse logistic regression: a convex programming approach. IEEE Trans. Inf. Theory. 59(1), 482–494 (2013).
M. Slawski, P. Li, in Advances in Neural Information Processing Systems. bbit marginal regression (NIPSMontreal, 2015), pp. 2062–2070.
L. Zhang, J. Yi, R. Jin, in International Conference on Machine Learning. Efficient algorithms for robust onebit compressive sensing (JMLRBeijing, 2014), pp. 820–828.
S. Bhojanapalli, B. Neyshabur, N. Srebro, in Advances in Neural Information Processing Systems. Global optimality of local search for low rank matrix recovery (NIPSBarcelona, 2016), pp. 3873–3881.
T. Zhao, Z. Wang, H. Liu, in Advances in Neural Information Processing Systems. A nonconvex optimization framework for low rank matrix estimation (NIPSMontreal, 2015), pp. 559–567.
S. A. Bhaskar, Probabilistic lowrank matrix completion from quantized measurements. J. Mach. Learn. Res.17(60), 1–34 (2016).
T. Cai, W. X. Zhou, A maxnorm constrained minimization approach to 1bit matrix completion. J. Mach. Learn. Res.14(1), 3619–3647 (2013).
Y. Fu, J. Gao, D. Tien, Z. Lin, in 2014 International Joint Conference on Neural Networks (IJCNN). Tensor LRR based subspace clustering (IEEE, 2014), pp. 1877–1884. https://doi.org/10.1109/IJCNN.2014.6889472.
L. Baltrunas, M. Kaminskas, B. Ludwig, O. Moling, F. Ricci, A. Aydin, K. H. Lüke, R. Schwaiger, in International Conference on Electronic Commerce and Web Technologies. Incarmusic: contextaware music recommendations in a car (Springer, 2011), pp. 89–100. https://doi.org/10.1007/9783642230141_8.
H. S. Sahambi, K. Khorasani, A neuralnetwork appearancebased 3d object recognition using independent component analysis. IEEE Trans. Neural Netw.14(1), 138–149 (2003).
N. I. Bruce, B. Murthi, R. C. Rao, A dynamic model for digital advertising: the effects of creative format, message content, and targeting on engagement. J. Mark. Res.54(2), 202–218 (2017).
R. Li, W. Zhang, Y. Zhao, Z. Zhu, S. Ji, Sparsity learning formulations for mining timevarying data. IEEE Trans. Knowl. Data Eng.27(5), 1411–1423 (2015).
N. Cohen, A. Shashua, in International Conference on Machine Learning. Convolutional rectifier networks as generalized tensor decompositions (JMLRNew York, 2016), pp. 955–963.
K. Maruhashi, M. Todoriki, T. Ohwa, K. Goto, Y. Hasegawa, H. Inakoshi, H. Anai, in ThirtySecond AAAI Conference on Artificial Intelligence. Learning multiway relations via tensor decomposition with neural networks (AAAI PressNew Orleans, Louisiana, 2018).
F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products. J. Math. Phys.6(14), 164–189 (1927).
J. B. Kruskal, Threeway arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl.18(2), 95–138 (1977).
L. R. Tucker, Some mathematical notes on threemode factor analysis. Psychometrika. 31(3), 279–311 (1966).
J. Liu, P. Musialski, P. Wonka, J. Ye, Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern. Anal. Mach. Intell.35(1), 208–220 (2012).
X. Zhang, Z. Zhou, D. Wang, Y. Ma, in TwentyEighth AAAI Conference on Artificial Intelligence. Hybrid singular value thresholding for tensor completion (AAAI PressQuébec City, 2014).
Q. Zhao, L. Zhang, A. Cichocki, Bayesian cp factorization of incomplete tensors with automatic rank determination. IEEE Trans. Pattern. Anal. Mach. Intell.37(9), 1751–1763 (2015).
T. Yokota, Q. Zhao, A. Cichocki, Smooth parafac decomposition for tensor completion. IEEE Trans. Signal Process.64(20), 5423–5436 (2016).
J. A. Bengua, H. N. Phien, H. D. Tuan, M. N. Do, Efficient tensor completion for color image and video recovery: lowrank tensor train. IEEE Trans. Image Process.26(5), 2466–2479 (2017).
Q. Yao, J. T. Y. Kwok, B. Han, in International Conference on Machine Learning. Efficient nonconvex regularized tensor completion with structureaware proximal iterations (JMLRLong Beach, CA, 2019), pp. 7035–7044.
C. Mu, B. Huang, J. Wright, D. Goldfarb, in International Conference on Machine Learning. Square deal: lower bounds and improved relaxations for tensor recovery (JMLRBeijing, 2014), pp. 73–81.
X. Zhang, D. Wang, Z. Zhou, Y. Ma, Robust lowrank tensor recovery with rectification and alignment. IEEE Trans. Pattern. Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2929043.
J. H. Yang, X. L. Zhao, T. Y. Ji, T. H. Ma, T. Z. Huang, Lowrank tensor train for tensor robust principal component analysis. Appl. Math. Comput.367:, 124783 (2020).
T. X. Jiang, T. Z. Huang, X. L. Zhao, L. J. Deng, Multidimensional imaging data recovery via minimizing the partial sum of tubal nuclear norm. J. Comput. Appl. Math.372:, 112680 (2020).
Y. B. Zheng, T. Z. Huang, X. L. Zhao, T. X. Jiang, T. H. Ma, T. Y. Ji, Mixed noise removal in hyperspectral image via lowfiberedrank regularization. IEEE Trans. Geosci. Remote Sens.58(1), 734–749 (2019).
T. G. Kolda, B. W. Bader, Tensor decompositions and applications. SIAM Rev.51(3), 455–500 (2009).
V. De Silva, L. H. Lim, Tensor rank and the illposedness of the best lowrank approximation problem. SIAM J. Matrix Anal. Appl.30(3), 1084–1127 (2008).
J. Chen, Y. Saad, On the tensor svd and the optimal low rank orthogonal approximation of tensors. SIAM J. Matrix Anal. Appl.30(4), 1709–1734 (2009).
J. Li, F. Huang, Guaranteed simultaneous asymmetric tensor decomposition via orthogonalized alternating least squares. arXiv preprint arXiv:1805.10348 (2018).
W. P. Krijnen, T. K. Dijkstra, A. Stegeman, On the nonexistence of optimal solutions and the occurrence of “degeneracy” in the candecomp/parafac model. Psychometrika. 73(3), 431–439 (2008).
A. Aidini, G. Tsagkatakis, P. Tsakalides, 1bit tensor completion. Electron. Imaging. 2018(13), 261–1 (2018).
B. Li, X. Zhang, X. Li, H. Lu, Tensor completion from onebit observations. IEEE Trans. Image Process.28(1), 170–180 (2019).
N. Ghadermarzy, Y. Plan, O. Yilmaz, Learning tensors from partial binary measurements. IEEE Trans. Signal Process.67(1), 29–40 (2019).
S. Zhe, K. Zhang, P. Wang, K. c. Lee, Z. Xu, Y. Qi, Z. Ghahramani, in Advances in Neural Information Processing Systems. Distributed flexible nonlinear tensor factorization (NIPSBarcelona, 2016), pp. 928–936.
S. Chen, M. R. Lyu, I. King, Z. Xu, in Advances in Neural Information Processing Systems. Exact and stable recovery of pairwise interaction tensors (NIPSLake Tahoe, 2013), pp. 1691–1699.
E. Richard, A. Montanari, in Advances in Neural Information Processing Systems. A statistical model for tensor PCA (NIPSMontreal, 2014), pp. 2897–2905.
X. Zhang, D. Wang, Z. Zhou, Y. Ma, in Advances in Neural Information Processing Systems. Simultaneous rectification and alignment via robust recovery of lowrank tensors (NIPSLake Tahoe, 2013), pp. 1637–1645.
A. Smilde, R. Bro, P. Geladi, Multiway Analysis: Applications in the Chemical Sciences (Wiley, Hoboken, 2005).
Y. Baig, E. M. Lai, J. Lewis, in 2010 17th International Conference on Telecommunications. Quantization effects on compressed sensing video (IEEE, 2010), pp. 935–940.
G. Zhang, J. Jia, T. T. Wong, H. Bao, Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern. Anal. Mach. Intell.31(6), 974–988 (2009).
R. A. Harshman, et al., Foundations of the parafac procedure: models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics. UCLA, 1–84 (1970).
J. D. Carroll, J. J. Chang, Analysis of individual differences in multidimensional scaling via an nway generalization of “eckartyoung” decomposition. Psychometrika. 35(3), 283–319 (1970).
L. R. Tucker, Some mathematical notes on threemode factor analysis. Psychometrika. 31(3), 279–311 (1966).
J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program.146(12), 459–494 (2014).
H. Golub, C. F. Van Loan, Matrix computations (Johns Hopkins Universtiy Press, 1996).
A. S. Georghiades, B. Peter N, K. David J, From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern. Anal. Mach. Intell.23(6), 643–660 (2001).
K. C. Lee, J. Ho, D. J. Kriegman, Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern. Anal. Mach. Intell.27(5), 684–698 (2005).
E. J. Candes, Y. Plan, Matrix completion with noise. Proc. IEEE. 98(6), 925–936 (2010).
R. Tomioka, T. Suzuki, Spectral norm of random tensors. arXiv preprint arXiv:1407.1870 (2014).
S. Friedland, L. H. Lim, Nuclear norm of higherorder tensors. Math. Comput.87(311), 1255–1281 (2018).
B. Jiang, F. Yang, S. Zhang, Tensor and its tucker core: the invariance relationships. Numer. Linear Algebra Appl.24(3), 2086 (2017).
Y. Xu, W. Yin, A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci.6(3), 1758–1789 (2013).
Acknowledgements
This research is supported in part by ARO W911NF1710407 and the RensselaerIBM AI Research Collaboration (http://airc.rpi.edu), part of the IBM AI Horizons Network (http://ibm.biz/AIHorizons).
Author information
Authors and Affiliations
Contributions
Authors’ contributions
Ren and Meng conceived and designed the method and the experiments. Ren performed the experiments and drafted the manuscript. Meng revised the manuscript. Jinjun provided many helpful suggestions. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Consent for publication
Informed consent was obtained from all authors included in the study.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, R., Wang, M. & Xiong, J. Tensor recovery from noisy and multilevel quantized measurements. EURASIP J. Adv. Signal Process. 2020, 41 (2020). https://doi.org/10.1186/s1363402000698z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363402000698z
Keywords
 Tensor recovery
 CP decomposition
 Lowrank
 Multilevel quantization
 Tensor singular value decomposition
 Nonconvex optimization