Achieve data privacy and clustering accuracy simultaneously through quantized data recovery

Wang, Ren; Wang, Meng; Xiong, Jinjun

doi:10.1186/s13634-020-00682-7

Research
Open access
Published: 07 May 2020

Achieve data privacy and clustering accuracy simultaneously through quantized data recovery

Ren Wang¹,
Meng Wang¹ &
Jinjun Xiong²

EURASIP Journal on Advances in Signal Processing volume 2020, Article number: 22 (2020) Cite this article

2220 Accesses
4 Citations
Metrics details

Abstract

This paper develops a data collection and processing framework that achieves individual users’ data privacy and the operator’s information accuracy simultaneously. Data privacy is enhanced by adding noise and applying quantization to the data before transmission, and the privacy of an individual user is measured by information-theoretic analysis. This paper develops a data recovery and clustering method for the operator to extract features from the privacy-preserving, partially corrupted, and partially observed measurements of a large number of users. To prevent cyber intruders from accessing the data of many users, it also develops a decentralized algorithm such that multiple data owners can collaboratively recover and cluster the data without sharing the raw measurements directly. The recovery accuracy is characterized analytically and showed to be close to the fundamental limit of any recovery method. The proposed algorithm is proved to converge to a critical point from any initial point. The method is evaluated on recorded Irish smart meter data and UMass smart microgrid data.

1 Introduction

Smart meters provide fine-grained measurements of power consumption of industrial and residential customers and can enhance the distribution system visibility. Non-intrusive load monitoring (NILM) approaches [1, 2] can identify individual appliances from the high-time-resolution smart meter data of the aggregated power consumption. Intruders can thus extract user behavior, and user privacy is an increasing concern. One way to protect data privacy is by applying additive homomorphic encryption [3]. It requires the network to have tree-like connections and can only decrypt the sum of the load curves. The other way to enhance data privacy is data obfuscation whereby the actual power consumption of each household is masked by adding noise to the smart meter measurements either through signal processing approaches [4, 5] or by physically adding rechargeable batteries to the households [6, 7]. Moreover, the aggregated consumption of the load and the battery can be adjusted to a constant to obfuscate the information further [8, 9]. Then, applying the NILM to these noisy and quantized measurements, an intruder can no longer accurately identify the patterns of individual appliances and, in turn, the user behavior. The increase in user privacy is achieved, however, at a cost of data distortion and reduced data accuracy for the operating center [10–12]. Although the operating center does not need high-time-resolution information of every individual appliances in each household, it still requires accurate estimation of the aggregated power consumption and the common load patterns among households for forecasting, demand response, and planning. For example, the center clusters customers with similar load patterns and then employs the load pattern of each cluster to enhance the load forecasting accuracy [13] and determine the incentives for demand response [14, 15]. If noise and quantization are added to the data to enhance the privacy, the information accuracy for the operator is effectively reduced.

This paper shows that the data privacy can be protected for each individual user^{Footnote 1} and, at the same time, the information accuracy at the operating center about user power consumption and the major patterns among different users are maintained. To the best of our knowledge, this is the first work that achieves data privacy and information accuracy simultaneously. In our proposed framework, each user’s actual power consumption is masked by first adding noise to the measurements and then quantizing the output to one of a few levels. The privacy of an individual user can be enhanced in this way, from an information-theoretic perspective [16–19]. Once the data is quantized, the variation information is blurred and hence NILM methods fail to identify individual appliances. Although adding noise and quantization have been employed before to enhance privacy (e.g., [6, 20]), this paper, for the first time, shows that such privacy enhancement does not necessarily lead to a reduction in the information accuracy. The central technical contribution of this paper is the development of a data recovery and clustering method, even when the measurements are highly noisy and quantized, contain significant errors, and are partially lost. Our method is proved to provide accurate data recovery and clustering results, as long as the center has measurements from a sufficient number of users. In contrast, a cyber intruder with access to the measurements of a small number of users cannot obtain accurate information even with the same approach. We develop a decentralized algorithm that allows multiple data owners to cooperatively recover and cluster the data without sharing their own raw measurements directly. Then, it is extremely difficult for an intruder to access large amounts of data. Thus, the data privacy of an individual user is enhanced while maintaining the information accuracy for the operating center.

Since the load profiles with similar load patterns can be represented by data points in a low-dimensional subspace in the high-dimensional ambient space, all the load profiles can be characterized by the Union of Subspaces (UoS) model [21], and the load clustering problem can be formulated as a subspace clustering problem. Various subspace clustering techniques have been developed, see e.g., [21–26]. None of these approaches, however, considers the case that the measurements are highly quantized. To the best of our knowledge, only one recent work considered subspace clustering and data recovery from highly noisy and quantized data [27]. This paper follows the mathematical setup of [27] but extends significantly in the following aspects. Ref. [27] does not consider data privacy, while this paper proposes a data collection framework to achieve data privacy and information accuracy simultaneously. We characterize the data privacy through mutual information, and such analysis does not exist in [27]. Ref. [27] assumes that all the measurements are available to the center, while this paper considers a more general setup that partial measurements are lost during the transmission and do not arrive at the center. This paper characterizes the data recovery error by our proposed method analytically as a function of data loss percentage. Moreover, this paper characterizes the fundamental limit of the recovery error by any possible recovery method and shows that our method is nearly optimal in reducing the recovery error. All these fundamental analyses do not exist in [27]. Furthermore, only a centralized algorithm Sparse-APA is discussed in [27]. This paper develops a Distributed Sparse Alternative Proximal Algorithm (DSAPA) for multiple data owners to collaboratively solve the subspace clustering and data recovery problem without sharing the measurements with others. Thus, the user data privacy can be further protected. This paper is also related to the quantized matrix recovery problem [28–36], in which the data matrix is assumed to be low rank. The low-rank matrix model is a special case of the UoS model by restricting to one subspace only. In fact, the data matrix of the load profiles can be high rank or even full rank in our setup. Finally, we remark that this paper considers smart meter measurements that measure the aggregated energy consumption in a house, and does not consider applying NILM on the operator side. Distributed smart metering can provide energy consumption of individual electrical appliances in a house [20].

The rest of the paper is organized as follows. Section 2 introduces our proposed framework, problem formulation, related works, and the data privacy enhancement analysis. The theoretical analyses of our recovery and clustering method is presented in Section 3. Section 4 introduces the details of the DSAPA with its convergence guarantee. Section 5 records the numerical experiments of our method on the real smart meter dataset. Section 6 concludes the paper. All the proofs are deferred to Appendix 1, Appendix 2, Appendix 3, Appendix 4, Appendix 5, Appendix 6, and Appendix 7.

2 Our proposed framework of privacy-preserving data collection and information recovery

2.1 Our framework and problem formulation

Figure 1 visualizes our proposed framework of privacy-preserving smart meter data collection and information recovery. To enhance the user data privacy, the actual power consumption is mapped to a few fixed power levels at the output of the smart meter. One can achieve this through signal processing in the smart meter or connecting a rechargeable battery to each household. Thus, the actual consumption is masked in the noisy and quantized smart meter measurements. As shown in Fig. 1, the measurements are collected by W agents disjointly, and agents do not share measurements directly. The agents recover the data and cluster the users with similar consumption patterns collaboratively in a distributed fashion. When W=1, it reduces to the case of one single center.

We defer the discussion of user privacy enhancement through the proposed framework to Section 2.3. We first define the recovery and clustering problem from quantized data mathematically as follows. $L^{*} \in \mathbb {R}^{m \times n}$ denotes the actual power usages of n users, with each column containing the power usage of one user in m time instants. We assume that users with similar consumption patterns belong to the same group and there are p groups in total. The corresponding columns of the same group belong to a d-dimensional subspace in $\mathbb {R}^{m}$ with d≤m. Let S_i (i∈[p]) denote the ith subspace, and these p subspaces are distinct^{Footnote 2}. Let r denote the rank of L^∗, then r≤pd. Let $L^{*}_{i}$ denote the submatrix of L^∗ that contains points in S_i, and let n_i denote the number of columns in $L^{*}_{i}$, i.e., the number of users in group i. We assume m≤n_i≤ξn/p for all i and some positive constant ξ. We further assume m=n/κp for some positive constant κ to simplify the representation of main results.

There exists a coefficient matrix $C^{*}\in \mathbb {R}^{n \times n}$ such that L^∗=L^∗C^∗, $C^{*}_{i,i}=0$ for all i∈[n]. Moreover, $C^{*}_{i,j}$ is zero if the ith and jth columns of L^∗ do not belong to the same subspace [21]. We summarize these two properties as self-expressive property and subspace-preserving property in Definition 1. These properties have been exploited in the literature of subspace clustering and are summarized as follows.

Definition 1

[27] A matrix $L \in \mathbb {R}^{m \times n}$ has the self-expressive property if L=LC for some $C \in \mathbb {R}^{n \times n}$, and C_i,i=0 for all i∈[n]. Moreover, C has the subspace-preserving property of L if C_i,j=0 for columns i and j of L belonging to different subspaces.

Let matrix $E^{*} \in \mathbb {R}^{m \times n}$ denote the additive errors in the measurements. We assume the number of nonzeros s in E^∗ is much smaller than mn. The partially corrupted measurements can be represented by X^∗=L^∗+E^∗. We assume the energy consumption and the errors are bounded, i.e., ∥L^∗∥_∞≤α₁ and ∥E^∗∥_∞≤α₂, for some positive constants α₁,α₂, and the infinity norm ∥·∥_∞ measures the maximum absolute value.

The quantization process in each household is modeled as follows. The measured energy consumption at each time step is mapped to one of K values in a probabilistic fashion. Figure 2 shows the quantization process. It can be modeled as adding random noise first and then quantizing to K levels. $N \in \mathbb {R}^{m \times n}$ is independent from X^∗. Entries of N are i.i.d. generated from a fixed cumulative distribution function (c.d.f.) Ψ(x). The quantization boundaries ω₀<ω₁<...<ω_l−1<ω_l...<ω_K and the quantized value $\mathcal {Q}_{l}, l \in [K]$ for the bin [ω_l−1,ω_l) are given. Then, the probability of mapping $X^{*}_{i,j}$ to $Y_{i,j} = \mathcal {Q}_{l}, \forall i,j$ is represented by

$$ \begin{aligned} {\varphi}_{l}(X^{*}_{i,j})&=P\left(Y_{i,j}=\mathcal{Q}_{l}|X^{*}_{i,j}\right) \\&= \Psi\left(\omega_{l}-X^{*}_{i,j}\right)-\Psi\left(\omega_{l-1}-X^{*}_{i,j}\right), \end{aligned} $$

(1)

and $\sum _{l=1}^{K}{\varphi }_{l}\left (X^{*}_{i,j}\right)=1$. The noise N is introduced to hide the user information. One choice of Ψ(x) is the probit model with Ψ(x)=Ψ_norm(x/σ), where Ψ_norm is the c.d.f. of the standard Gaussian distribution $\mathcal {N}(0,1)$, and σ>0 is the standard deviation. Note that $\Psi \left (\omega _{l}-X^{*}_{i,j}\right) \geq \Psi \left (\omega _{l-1}-X^{*}_{i,j}\right)+\beta $ for some positive β. Then, 1≥φ_l≥β>0.

The quantized measurements Y are sent to the center. Data losses can happen during the communication, visualized by the question marks in Fig. 2. Let set Ω denote the indices of measurements that are not lost. In the general case that the measurements are collected by W agents/nodes separately, we assume for simplicity that each node collects the data from q=n/W users. Node 1 collects the data from the first q users; node 2 collects the next q users and so on. Let Φ_i={q(i−1)+1,q(i−1)+2,...,qi}, then $L^{*}_{\Phi _{i}}$ denotes the submatrix of L^∗ with column indices in Φ_i. L^∗ can also written as $\left [L^{*}_{\Phi _{1}},L^{*}_{\Phi _{2}},...,L^{*}_{\Phi _{W}}\right ]$. Similarly, $E^{*}=\left [E^{*}_{\Phi _{1}},E^{*}_{\Phi _{2}},...,E^{*}_{\Phi _{W}}\right ]$. Node i collects $Y_{\Phi _{i}}$.

The data recovery and pattern extraction problem for one center can be stated as follows.

(P1) Given quantized measurementY_Ω, known boundariesω₀<ω₁<...<ω_K and noise distribution Ψ, can we recover the real power usages L^∗ and cluster the users through estimating C^∗ simultaneously?

Moreover, if measurements Y_i,j’s are not shared among W nodes to protect the user privacy,

(P2) Can we estimate L^∗ and C^∗ with W nodes in a decentralized fashion?

Some notations in this paper are summarized in Table 1.

Table 1 Notations

Full size table

2.2 Related work

When p=1, i.e., all the users share the same pattern, L^∗ is approximately a low-rank matrix. Then, (P1) reduces to the problem of low-rank matrix recovery from quantized measurements [28–37], with motivating applications in image processing [38], collaborative filtering [31], and sensor networks [39]. Note that since there is only one subspace in this case, these works do not consider data clustering and only focus on data recovery.

When the quantization process does not exist, the problem (P1) reduces to the conventional subspace clustering problem [21–26,40]. If the subspace preserving C^∗ is estimated, one can apply the spectral clustering [41] method to obtain the clustering of the data points. For example, Sparse Subspace Clustering (SSC) [21] is a common choice for subspace clustering, and SSC estimates C^∗ by solving a convex optimization problem. Other clustering methods exist that cluster data points based on the Euclidean distance. For instance, refs. Lin et al. [42] and Keogh et al. [43] leverage a linear combination of box basis functions to approximate the original data, yet still retain the features of interest.

Reference [27] is the first paper that studies the subspace clustering from quantized measurements when p≥1. Wang et al. [27] do not consider missing data and develop a centralized data recovery method from full observations. This paper follows the same problem formulation as [27] and extends to the general case of partial observations. We provide both the recovery guarantee of our approach and the fundamental limit of the recovery accuracy by any method. Moreover, a framework of privacy-preserving smart meter data collection is proposed in this paper, and we further enhance the data privacy by developing a decentralized data recovery method.

Our problem formulation and methods apply to other domains such as image and video processing and phasor measurement unit (PMU) data analytics for power systems. In image recovery and image clustering [27], images of the same person with varying illumination belong to the same low-dimensional subspace [44]. Columns of L^∗ correspond to images of multiple people. The goal is to enhance the image quality and cluster the data using low-resolution images. Similarly, in motion segmentation, each column of L^∗ represents the trajectory of a reference point. The reference points in the same rigid object belong to the same subspace. The motion segmentation becomes a subspace clustering problem from the observed measurements. In PMU data analytics, the time series of PMUs affected by the same event belong to the same subspace [32,45]. The event location problem can be solved by subspace clustering.

2.3 Data privacy enhancement in the proposed framework

Various methods have been developed to enhance the privacy of power consumption data. For example, one can use pre-processing techniques like temporal averaging, adding additional noise, and quantization [4,5,20] to alter the data. However, directly altering data might affect the accuracy of some applications, e.g., billing and profiling [46]. Alternatively, rechargeable batteries and PV converter can be leveraged to mask the actual power consumption [6,7,47]. The noise addition and quantization process in this paper can be achieved by either signal processing or rechargeable batteries.

In general, privacy guarantee can be achieved through either computational hardness [48–50] or information-theoretic analysis [16–19]. The existing analytical results of data privacy only work for specific or simple models and do not easily generalize. For instance, under the setup of communication between two nodes, ref. [17] analyzes the trade-off between data sharing and privacy. Under the assumptions of i.i.d. input load sequence and an i.i.d. energy harvesting process, the minimum information leakage rate is provided with a certain energy management policy in [51]. Some other methods try to analyze data privacy numerically. In [52], the information leakage rate is measured by the relative entropy of the probability measures of the original load data and the modified load data and is calculated by Monte-Carlo method. Refs. [7] and [12] consider measuring the information leakage through mutual information of the original load data and the modified load data. Following the existing work on smart meter data privacy, see, e.g., [19,52–54], this paper analyzes the data privacy from an information-theoretic perspective. The data privacy of an individual user is analyzed by comparing the original data and the data after privacy enhancement through quantities like the Kullback-Leibler (KL) divergence [52], mutual information, and normalized mutual information [18]. In our framework, the actual energy consumption of user i, denoted by $L^{*}_{\star i}$, is masked by additive Gaussian noise and quantization, resulting in Y_⋆i. Let $P_{L^{*}_{\star i}}$ and $P_{Y_{\star i}}$ denote the probability distribution of $L^{*}_{\star i}$ and Y_⋆i, respectively. The privacy can be measured through the normalized mutual information (NI) between $L^{*}_{\star i}$ and Y_⋆i [18], defined as follows:

Definition 2

$$ \begin{aligned} & NI\left(L^{*}_{\star i}, Y_{\star i}\right) \\&= \frac{\sum_{x \in \mathcal{X}}\sum_{y \in \mathcal{Y}} P_{\left(L^{*}_{\star i},Y_{\star i}\right)}(x,y) \log \frac{P_{\left(L^{*}_{\star i},Y_{\star i}\right)}(x,y)}{P_{L^{*}_{\star i}}(x)P_{Y_{\star i}}(y)}}{\sum_{x \in \mathcal{X}} P_{L^{*}_{\star i}}(x) \log \frac{1}{P_{L^{*}_{\star i}}(x)}} \end{aligned} $$

(2)

where spaces $\mathcal {X}$and $\mathcal {Y}$ are the feasible set of $L^{*}_{\star i}$ and Y_⋆i, respectively. $P_{\left (L^{*}_{\star i},Y_{\star i}\right)}$ is the joint distribution of $L^{*}_{\star i}$ and Y_⋆i. $P_{L^{*}_{\star i}}$ and $P_{Y_{\star i}}$ are the marginal distributions of $L^{*}_{\star i}$ and Y_⋆i, respectively.

The numerator of (2) is the mutual information between $L^{*}_{\star i}$ and Y_⋆i, and the denominator is the entropy of $L^{*}_{\star i}$. When $L^{*}_{\star i}$ and Y_⋆i are independent of each other, $NI\left (L^{*}_{\star i}, Y_{\star i}\right)$ reaches its minimum value 0. When Y_⋆i is exactly the same as $L^{*}_{\star i}$, $NI\left (L^{*}_{\star i}, Y_{\star i}\right)$ equals to the maximum value 1. A smaller NI corresponds to a higher level of data privacy of $L^{*}_{\star i}$ and also indicates more significant difference between $L^{*}_{\star i}$ and Y_⋆i. Note that rigorously speaking, $L^{*}_{\star i}$ belongs to the continuous space. However, since all measuring devices have a finite resolution, $L^{*}_{\star i}$ can be viewed as a discrete random variable. When computing NI in practice, one can divide the range of the values into small regions to compute sample probability distribution.

The above information-theoretic measures show that when the data of individual users are processed separately, a user’s data privacy is enhanced at the cost of reduced information accuracy. We need to emphasize that the measures like NI or KL divergence focus on an individual signal and do not characterize the information recovery when multiple signals are processed together. In fact, when the data of multiple users are available, and strong correlations exist among different users’ data, such correlation can be leveraged to enhance the data accuracy. As stated in problems (P1) and (P2), the major technical objective of this paper is to develop data recovery and clustering methods from quantized measurements of multiple users, where the data correlations are characterized by data points belonging to the same subspace. As we will show in Section 3 (Theorem 1 and Proposition 1), the asymptotic information accuracy from quantized measurements can be achieved when the number of users increases to the infinity. We need to emphasize that this result does not contradict the data privacy enhancement by adding noise and applying quantization. This is because the asymptotic information accuracy is only achieved when processing the correlated data of a large number of users, while a cyber intruder is very unlikely to have access to the data of so many users. In our proposed decentralized data collection and processing framework (Fig. 1), each agent collects the measurements of a subset of users, and the measurements are not directly shared among the agents. A cyber intruder needs to hack either all these agents or the smart meters of all users to be able to access all the data. Since such attack is very unlikely to happen, the user’s data privacy is still protected. Privacy from the recovery perspective will be discussed in details in Section 3.3.

3 Results: theoretical

Here, we consider solving (P1) at a single center and defer the discussion of solving (P2) in a decentralized way through distributed nodes to Section 4. We propose to estimate L^∗, C^∗, and E^∗ by the solution $\left (\hat {L},\hat {E},\hat {C}\right)$ to the following optimization problem,

$$ \min_{L,E \in \mathbb{R}^{m\times n},C\in \mathbb{R}^{n \times n}} F(L,E) \: \:\: \: \textrm{s.t.} (L,E,C) \in \mathcal{S}_{f}, $$

(3)

where

$$ F(L,E)= -{\sum_{(i,j) \in \Omega}}\sum_{l=1}^{K}\boldsymbol{1}_{[Y_{i,j}=\mathcal{Q}_{l}]}\log\left({\varphi}_{l}\left(L_{i,j}+E_{i,j}\right)\right), $$

(4)

$$ \begin{aligned} &\mathcal{S}_{f} = \left\{(L,E,C): \|L\|_{\infty}\le\alpha_{1}, \|E\|_{\infty}\le \alpha_{2}, \|E\|_{0}\le s,\right. \\ & \text{rank}(L) \le r, L=LC, \|C_{\star i}\|_{0}\le d, C_{i,i}=0, \forall i \in [n]\}. \end{aligned} $$

(5)

1_[A] is the indicator function that takes value 1 if A is true and value 0 otherwise. ∥·∥₀ measures the number of nonzero entries in a vector or matrix. Data recovery and subspace clustering are achieved simultaneously by solving (3)–(5).

Equations (3)–(5) are a constrained maximum log-likelihood estimation problem that maximizes the likelihood of obtaining Y_Ω when the underlying data matrix is $\hat {L}$, and the error matrix is $\hat {E}$. The formulation follows (8) of [27] by extending from full observations to partial observations in Ω. After obtaining $\hat {C}$, spectral clustering [41] is applied to $\hat {C}$ to obtain group labels.

Equations (3)–(5) are nonconvex due to the nonconvexity of the feasible set $\mathcal {S}_{f}$ in (5). We first analyze the recovery and clustering performance, assuming that a solution exists. We defer the algorithm to Section 4.

3.1 Data recovery guarantee

Two constants γ_α and L_α are needed for the recovery analysis,

$$ \gamma_{\alpha} = \min_{l\in[K]}\inf_{|x|\le\alpha_{1}+\alpha_{2}}\left\{\frac{\dot{{\varphi}}_{l}^{2}(x)}{{\varphi}_{l}^{2}(x)}-\frac{\ddot{{\varphi}}_{l}(x)}{{\varphi}_{l}(x)}\right\}, $$

(6)

$$ L_{\alpha} = \max_{l\in[K]}\sup_{|x|\le\alpha_{1}+\alpha_{2}}\{|\dot{{\varphi}}_{l}(x)|/{\varphi}_{l}(x)\}, $$

(7)

where $\dot {{\varphi }}_{l}(x)$ and $\ddot {{\varphi }}_{l}(x)$ are the first- and second-order derivatives with respect to x. Note that $\dot {{\varphi }}_{l}(x)^{2} - \ddot {{\varphi }}_{l}(x)\varphi _{l}(x) > 0$ if φ_l is strictly log-concave. One can check that φ_l is strictly log-concave if Ψ is log-concave, which holds true for Gaussian and logistic distributions [28]. L_α and γ_α are bounded by some fixed constants when α₁, α₂, and φ_l are given.

Since the data recovery performance and the clustering performance are coupled together, we first analyze the recovery performance, assuming that the clustering results are not “arbitrarily bad.” We follow the same assumption as [27], which essentially requires that in the estimated clustering results, every cluster contains data points belong to at most a constant number out of p original subspaces. Formally, we have

Assumption 1

[27]: Columns of $\hat {L}$ belong to $\hat {p}$ subspaces, each of which has a dimension smaller or equal to d. Columns in $\hat {L}$ with indices corresponding to columns of L^∗ in S_i(i∈[p]) belong to at most (g−1) subspaces, where g is a constant larger than 1.

We follow the assumption in [28] about the location of the observed entries. We make a minor change to handle multiple subspaces instead of one subspace in [28]. Assumption 2 is a generalization of the uniform sampling and includes the uniform sampling as a special case. We define a binary matrix G with G_i,j=1 if and only if (i,j)∈Ω, i.e., Y_i,j is observed. G_i,j=0 otherwise. Let $G_{i} \in \mathbb {R}^{m\times n_{i}}$ denote the submatrix of G with columns corresponding to subspace i.

Assumption 2

Assume each column of G_i has h nonzero entries. Let σ₁(G_i) and σ₂(G_i) denote the largest and the second largest singular values of G_i, respectively. Assume σ₁(G_i)≥h and $\sigma _{2}(G_{i}) \le \mathcal {C} \sqrt {h}$ for i∈[p], where $\mathcal {C}$ is a positive constant.

Assumption 2 is similar to the sampling assumption in [28]. The difference is that we make the assumption on columns belonging to each subspace instead of the whole matrix. The above assumption is more general than the uniform sampling assumption [28].

Theorem 1

Suppose that φ_l(x)is strictly log-concave in x, ∀l∈[K]. Then, under Assumptions 1 and 2, with probability at least $\phantom {\dot {i}\!}1-pC_{1}e^{-C_{2}\xi n/p}$, any global minimizer $\hat {L}$ to (3)–(5) satisfies

$$ \begin{aligned} & \left\|\hat{L}-L^{*}\right\|_{F}/\sqrt{mn} \le \min \left(2\alpha_{1}+2\alpha_{2}\sqrt{\frac{s}{mn}}, U_{1}\right), \end{aligned} $$

(8)

where

$$ \begin{aligned} U_{1} =& C_{1}' \frac{\kappa d\sqrt{d}}{f^{2} \sqrt{m}} + C_{2}' \frac{d\kappa^{3/4}}{f^{3/2}m^{1/4}}\left(\frac{s}{mn}\right)^{1/4} \\&+ C_{3}' \frac{\sqrt{\kappa d}}{f}\left(\frac{s}{mn}\right)^{1/2} \end{aligned} $$

(9)

for some positive constants C₁, C₂, C1′(L_α,g,ξ), C2′(L_α,g,ξ,α₂), and C3′(L_α,g,ξ,α₂). $f=\frac {|\Omega |}{mn}=\frac {h}{m}$ is the data loss rate.

Theorem 1 characterizes the recovery error from partially observed, partially corrupted, and quantized measurements. It can be interpreted from the following aspects.

(1) Correction of corrupted measurements. We first fix the data loss rate f and consider the recovery performance with corrupted measurements. Suppose f is a constant, i.e., a constant fraction of the measurements are available. Then, (8) indicates that as long as the number of corrupted measurements s is at most Θ(md²p), we have^{Footnote 3}

$$ \left\|\hat{L}-L^{*}\right\|_{F}/\sqrt{mn} \le \mathcal{O}\left(\sqrt{\frac{d^{3}}{m}}\right). $$

(10)

Thus, the recovery method tolerates a constant number of corrupted per column without degrading the recovery performance.

(2) Asymptotic recovery of the actual data. Since $\mathcal {O}\left (\sqrt {\frac {d^{3}}{m}}\right)$ decreases to 0 when m increases to infinity, and ∥L^∗∥_F is in the order of $\sqrt {mn}$, (10) indicates that the relative error between $\hat {L}$ and L^∗ diminishes asymptotically. Moreover, as long as p is o(n), the failure probability $\phantom {\dot {i}\!}1-pC_{1}e^{-C_{2}\xi n/p}$ also decays to zero as n increases to infinity. The asymptotic recovery differentiates the operating center and cyber intruders. An operating center with a sufficient number of measurements can recover L^∗ accurately. In contrast, a cyber intruder with access to a small number of users cannot recover the data even using the same approach (3)–(5).

(3) Tolerance of the missing data. To the best of our knowledge, only refs. [28] and [31] provided the theoretical analysis of low-rank matrix recovery from quantized observations with data losses. No corruptions are considered in [28,31]. The relative recovery error by [28] is $\mathcal {O}\left (\sqrt {\frac {r^{3}}{m}}\right)$ under the partial observation case when f is a fixed constant, where r is the rank of the matrix. The relative recovery error by [31] is $\mathcal {O}\left (\frac {r^{1/4}}{m^{1/4}}\right)$ under the partial observation case. Our result in (10) indicates that when f is a constant, the error is at most $\mathcal {O}\left (\sqrt {\frac {d^{3}}{m}}\right)$ even with corrupted measurements. Note that the rank of L^∗ can be as large as pd when the subspaces are all orthogonal to each other. If one directly applies the approach in [37] to our setup, the relative recovery error can be as large as $\mathcal {O}\left (\sqrt {\frac {p^{3}d^{3}}{m}}\right)$, which is $\sqrt {p^{3}}$ times our recovery error. Thus, our approach outperforms the existing one by recovering and clustering data simultaneously even in the special case of no corruptions.

When there is no missing data, the recovery error by [27] is $\mathcal {O}\left (\sqrt {\frac {d}{m}}\right)$, which is slightly tighter than our error bound in (10). This is due to our techniques to handle the missing data.

3.2 Fundamental limit of any recovery method

The following theorem establishes the minimum possible error by any method from unquantized measurements. We consider the case that the number of corruptions is at most a constant fraction of the measurements. To simplify the analysis, we assume

$$ s \le \min\left(C_{0}mn, mn-\frac{64m}{d}\right) $$

(11)

where C₀ is a constant smaller than 1/2. Let

$$ \begin{aligned} &\mathcal{S}_{fX} = \left\{X: X=L+E, (L,E,C) \in \mathcal{S}_{f}\right\}. \end{aligned} $$

(12)

Theorem 2

Let $N \in \mathbb {R}^{m \times n}$ contain i.i.d. entries from $\mathcal {N}\left (0, \sigma ^{2}\right)$. Assume (11) holds. Consider any algorithm that, for any $X \in \mathcal {S}_{fX}$, takes M_ij=X_ij+N_ij,(i,j)∈Ω as the input and returns an estimate $\hat {X}$ of X. Then, there always exists some $X \in \mathcal {S}_{fX}$ such that with probability at least $\frac {3}{4}$,

$$ \frac{\left\|\hat{X}-X\right\|_{F}}{\sqrt{mn}} \geq \min \left(C_{3}, C_{4}\sigma\sqrt{\frac{d-\frac{d}{n}\left\lfloor \frac{s}{m} \right\rfloor-\frac{64}{n}}{fm-\frac{s_{\Omega}}{n}}}\right) $$

(13)

holds for some fixed constants C₃ and C₄, where $C_{3} = \sqrt {\frac {1-2C_{0}}{8}}\min (\alpha _{1}, \alpha _{2})$ and $C_{4} < \sqrt {\frac {1-2C_{0}}{256}}$. s_Ω is the number of errors in X_Ω.

Note that C₃ is a constant. When f is a constant, (13) indicates that

$$ \|\hat{X}-X\|_{F}/\sqrt{mn} \geq \Theta(\sqrt{d/m}). $$

(14)

The recovery error from unquantized measurements is at least $\Theta \left (\sqrt {\frac {d}{m}}\right)$. Comparing it with our error bound $\sqrt {\frac {d^{3}}{m}}$ in (10), one can see that our method is close to optimal. If the corrupted entries are randomly distributed, s_Ω is approximately Θ(fs). Then, the second term inside the minimization of (13) scales as $\Theta \left (\frac {1}{\sqrt {f}}\sqrt {\frac {d}{m}}\right)$.

3.3 Privacy from the recovery perspective

3.3.1 Recovery of a single user from its own data only

An intruder is often interested in the data of a certain user. If the adversary only has access to one user’s data, then problems (3)–(5) are reduced to

$$ \begin{aligned} &\min_{L,E \in \mathbb{R}^{m}} F(L,E) \\& \textrm{s.t.} \|L\|_{\infty}\le\alpha_{1}, \|E\|_{\infty}\le \alpha_{2}, \|E\|_{0}\le s. \end{aligned} $$

(15)

Note that since n=1, there is no constraint on C. (15) maximizes the log-likelihood of one user given the information about the quantized measurements. It can be viewed as a special case of the low-rank matrix recovery from quantized measurements considered in [37]. One can check that the average recovery error is upper bounded by $\mathcal {O}\left (\sqrt {d^{3}}\right)$ by setting n=1 in Theorem 5 of [37]. Similarly, the relative recovery by any method is at least in the order of $\Theta (\sqrt {d})$ by setting n=1 in Theorem 4 of [37]. This error bound does not depend on m, the number of measurements of this user. Therefore, if an intruder only has one user’s data, even if m is very large, the average recovery error is nonzero and does not diminish as m increases. Then, the privacy of the energy consumption behavior of this user is protected.

3.3.2 Recovery of a single user by leveraging other users in the same group

One can exploit the measurements from other users to increase the estimation accuracy of one target user. Suppose one can access n users’ data in m time steps, and these users all share similar load patterns as the target user, then from either Theorem 1 of this paper or Theorem 5 of [37], the average recovery error is at most $\mathcal {O}\left (\sqrt {\frac {d^{3}}{\min (m, n)}}\right)$. Compared with the previous case of accessing the data of one single user only, the recovery error is significantly reduced. We emphasize that the decrease of the recovery error results from exploiting correlations among users.

The number of quantization levels K also affects privacy. Intuitively, a smaller value of K corresponds to a higher level of privacy. However, the privacy level also depends on the selection of bin boundaries, and decreasing K does not necessarily increase privacy. For instance, if a pair of boundaries are chosen very close to each other so that no measurements located within the interval, then K=3 could reach the same privacy and recovery error as K=2. Therefore, K does not directly appear in Theorem 1 but rather affects the privacy indirectly through γ_α and L_α. The bin boundaries usually tend to be closer in the region where the measurements concentrate.

For smart meter data, the bin boundaries can be selected in the range of a typical household consumption level. If a certain house has some electrical appliances with an energy consumption level significantly higher than normal households, this abnormal pattern of high energy consumption can in fact be masked in the noisy and quantized measurements due to the way how bin boundaries are selected. However, since this house has a different load pattern from other households, one cannot exploit other users’ data to enhance the recovery accuracy of this user. The recovered data of this user will have a nonzero error as discussed in the first paragraph of Section 3.3.

3.4 Clustering guarantee

The clustering performance is evaluated through the subspace-preserving property of $\hat {C}$. A sufficient condition for $\hat {C}$ to be subspace-preserving is stated as follows.

Proposition 1

Suppose columns of $\hat {L}$ are i.i.d. drawn from certain unknown continuous distribution supported on $\hat {p}$ distinct d-dimensional subspaces, then the global minimizer $\hat {C}$ of (3) has the subspace-preserving property for $\hat {L}$.

Ref. [27] also provides a sufficient condition for $\hat {C}$ to be subspace-preserving. The subspaces are required to be independent with each other in [27]. Two independent subspaces intersect only at zero. Here, we require subspaces to be distinct from each other. Two subspaces are distinct if for each subspace, there exists one point that belongs to this subspace but not the other. The data points are generated based on some continuous distribution supported on these distinct subspaces.

4 Distributed sparse alternative proximal algorithm for data recovery and clustering

We next propose a distributed algorithm to solve (3) by W nodes collaboratively such that node i can estimate $L^{*}_{\Phi _{i}}$ from its acquired measurements $Y_{\Phi _{i}}$, while it does not know $Y_{\Phi _{j}}$ or $L^{*}_{\Phi _{j}}$ for all other j’s nodes. This further enhances user privacy.

We first follow [27] and move some constraints to the objective function to simplify the algorithm design. Since the rank of L is at most r, we factorize L as L=UV^T, where $V \in \mathbb {R}^{n \times r}$ and $U \in \mathbb {R}^{m \times r}$. We replace the equality constraints L=LC and L=UV^T by adding $\frac {\lambda _{1}}{2}\left \|V^{T}-V^{T}C\right \|_{F}^{2}$ and $\frac {\lambda _{2}}{2}\left \|UV^{T}-L\right \|_{F}^{2}$ to the objective function. The parameters λ₁ and λ₂ affect the tightness of the original constraints. Note that V^T=V^TC is a sufficient but not necessary condition for L=LC. Then, (3) is changed into

$$ \begin{aligned} &\left(\hat{U},\hat{V},\hat{L}, \hat{E}, \hat{C}\right) \\&= \underset{\substack{U\in \mathbb{R}^{m \times r},V\in \mathbb{R}^{n \times r}\\ L,E,C\in \mathbb{R}^{m \times n}}}{\arg\min} H(U,V,L,E,C) ~ \text{s.t.} (L,E,C) \in \mathcal{S}\mathcal{F}, \end{aligned} $$

(16)

where

$$ \begin{aligned} H(U,V,L,E,C)=&F(L,E) + \frac{\lambda_{1}}{2}\left\|V^{T}-V^{T}C\right\|_{F}^{2} \\&+ \frac{\lambda_{2}}{2}\left\|UV^{T}-L\right\|_{F}^{2}, \end{aligned} $$

(17)

$$ \begin{aligned} \mathcal{S}\mathcal{F} = &\{(L,E,C): \|L\|_{\infty}\le\alpha_{1}, \|E\|_{\infty}\le\alpha_{2},\\&\|E\|_{0}\le s,\|C_{\star i}\|_{0}\le d, C_{i,i}=0, \forall i \in [n]\}, \end{aligned} $$

(18)

The solution of (16) is the same as that of (3) when λ₁ and λ₂ approach the infinity.

We next decompose V into W parts, and let $V_{\Phi _{i} \star } \in \mathbb {R}^{q \times r}, i \in [W]$ denote the rows of V with row indices Φ_i. Then, the objective in (17) can be decomposed as follows:

$$ \begin{aligned} H (U,V, L, E,C) = \sum_{i=1}^{W} \mathcal{H}\left(U,V,L_{\Phi_{i}},E_{\Phi_{i}},C_{\Phi_{i}}\right) \end{aligned} $$

(19)

where

$$ \begin{aligned} &\mathcal{H}\left(U,V,L_{\Phi_{i}},E_{\Phi_{i}},C_{\Phi_{i}}\right)\\&= F\left(L_{\Phi_{i}},E_{\Phi_{i}}\right) + \frac{\lambda_{1}}{2}\left\|V_{\Phi_{i} \star}^{T}-V^{T}C_{\Phi_{i}}\right\|_{F}^{2} \\&~~~+ \frac{\lambda_{2}}{2}\left\|{UV}_{\Phi_{i} \star}^{T}-L_{\Phi_{i}}\right\|_{F}^{2}, \end{aligned} $$

(20)

$$ \begin{aligned} &F\left(L_{\Phi_{i}},E_{\Phi_{i}}\right)=-{\underset{\substack{(k,j + iq -q)\\ \in \Omega,\\ \forall k \in [m], j \in [q]}}{\sum}}\sum_{l=1}^{K}\\&~~~~~~~~~~ \boldsymbol{1}_{[(Y_{\Phi_{i}})_{k,j}=l]}\log\left({\varphi}_{l}\left((L_{\Phi_{i}})_{k,j}+\left(E_{\Phi_{i}}\right)_{k,j}\right)\right). \end{aligned} $$

(21)

and V contains $V_{\Phi _{1} \star }$ to $V_{\Phi _{W} \star }$, i.e., $V = \left [\begin {array}{c} V_{\Phi _{1} \star } \\ V_{\Phi _{2} \star } \\ \vdots \\ V_{\Phi _{W} \star }\end {array}\right ]$. The constraint set $\mathcal {S}\mathcal {F}$ in (18) is equivalent to the intersection of ${\mathcal {S}\mathcal {F}}_{i}$’s (∀j∈[q]), with^{Footnote 4}

$$ \begin{aligned} &{\mathcal{S}\mathcal{F}}_{i} = \left\{\left(L_{\Phi_{i}},E_{\Phi_{i}},C_{\Phi_{i}}\right): \left\|L_{\Phi_{i}}\right\|_{\infty}\le\alpha_{1}, \left\|E_{\Phi_{i}}\right\|_{\infty}\le{\vphantom{\frac{s}{W}}}\right. \\&\left.\alpha_{2}, \left\|E_{\Phi_{i}}\right\|_{0}\le \frac{s}{W}, \left\|(C_{\Phi_{i}})_{\star j}\right\|_{0}\le d, (C_{\Phi_{i}})_{iq -q+j,j}=0\right\}. \end{aligned} $$

(22)

Then, (16) can be equivalently written as

$$ \begin{aligned} &\left(\hat{U},\hat{V}_{\Phi_{i} \star},\hat{L}_{\Phi_{i}}, \hat{E}_{\Phi_{i}}, \hat{C}_{\Phi_{i}}\right) \\&= \underset{\substack{C_{\Phi_{i}}\in \mathbb{R}^{n \times q}, U\in \mathbb{R}^{m \times r}\\V_{\Phi_{i} \star}\in \mathbb{R}^{q \times r}\\ L_{\Phi_{i}},E_{\Phi_{i}}\in \mathbb{R}^{m \times q}, \forall i \in [W]}}{\arg\min} \sum_{i=1}^{W} \mathcal{H}\left(U,V,L_{\Phi_{i}},E_{\Phi_{i}},C_{\Phi_{i}}\right) \\& ~~~~~~~~~~~~~~~ \text{s.t.} \left(L_{\Phi_{i}},E_{\Phi_{i}},C_{\Phi_{i}}\right) \in {\mathcal{S}\mathcal{F}}_{i}, \forall i \in [W]. \end{aligned} $$

(23)

where the estimated variables are U and W components of L,E,C, and V.

The constraints in (23) can be decomposed for W nodes, while the objective function cannot, due to the coupling of U and V. Here, we develop a synchronized Distributed Sparse Alternative Proximal Algorithm (DSAPA) to solve (23) with the convergence guarantee. The node i owns $Y_{\Phi _{i}}$ and estimates $V_{\Phi _{i} \star }$, $L_{\Phi _{i}}$, $E_{\Phi _{i}}$, $C_{\Phi _{i}}$, and U. Since all nodes have the estimates of U, and $L_{\Phi _{i}}={UV}_{\Phi _{i} \star }$, the key to protect user privacy of node i is not to share the estimate of $V_{\Phi _{i} \star }$, as well as $Y_{\Phi _{i}}$, to any other nodes.

In the (t+1)th iteration, node i sequentially updates $C_{\Phi _{i}}^{t+1}$, $V_{\Phi _{i} \star }^{t+1}$, $L_{\Phi _{i}}^{t+1}$, $E_{\Phi _{i}}^{t+1}$, U^t+1 in Subroutines 1–5. Each subroutine essentially follows the projected gradient. The gradient of H with respect to $V_{\Phi _{i} \star }$, $L_{\Phi _{i}}$, $E_{\Phi _{i}}$, U, and $C_{\Phi _{i}}$ are

$$ \begin{aligned} &\nabla_{C_{\Phi_{i}}} H = - \lambda_{1} V\left(V_{\Phi_{i} \star }^{T}-V^{T}C_{\Phi_{i}}\right) \\&= - \lambda_{1} V\left(V_{\Phi_{i} \star}^{T}-V_{\Phi_{i} \star}^{T}\left(C_{\Phi_{i}}\right)_{\Phi_{i} \star}-\sum_{j=1,j\not = i}^{W}V_{\Phi_{j} \star}^{T}\left(C_{\Phi_{i}}\right)_{\Phi_{j} \star}\right) \\&:= - \lambda_{1} V M_{\Phi_{i}}, \end{aligned} $$

(24)

$$ \begin{aligned} &\nabla_{{V_{\Phi_{i} \star}}} H = \lambda_{2} \left(V_{\Phi_{i} \star}U^{T}-L_{\Phi_{i}}^{T}\right)U \\&~~~ +\lambda_{1} \left[ \left(V_{\Phi_{i} \star}-C_{\Phi_{i}}^{T}V\right)-\sum_{j=1}^{W}\left(C_{{\Phi}_{j}}\right)_{{\Phi}_{i} \star}\left(V_{{\Phi}_{j} \star}-C_{{\Phi}_{j}}^{T}V\right) \right]\\ & =\lambda_{1} \left(M_{\Phi_{i}}^{T}-\sum_{j=1}^{W}\left(C_{{\Phi}_{j}}\right)_{{\Phi}_{i} \star}M_{{\Phi}_{j}}^{T}\right)+ \lambda_{2} \left(V_{\Phi_{i} \star}U^{T}-L_{\Phi_{i}}^{T}\right)U, \end{aligned} $$

(25)

$$ \begin{aligned} \nabla_{L_{\Phi_{i}}} H = \nabla F\left(L_{\Phi_{i}},E_{\Phi_{i}}\right) - \lambda_{2} \left({UV}_{\Phi_{i} \star}^{T}-L_{\Phi_{i}}\right), \end{aligned} $$

(26)

$$ \begin{aligned} \nabla_{E_{\Phi_{i}}} H = \nabla F\left(L_{\Phi_{i}},E_{\Phi_{i}}\right), \end{aligned} $$

(27)

$$ \begin{aligned} \nabla_{U} H = \lambda_{2} \left(UV^{T}-L\right)V := \lambda_{2} \left(U\sum_{i=1}^{W} \iota_{i} - \sum_{i=1}^{W} \zeta_{i}\right), \end{aligned} $$

(28)

where M=V^T−V^TC, $\phantom {\dot {i}\!}\iota _{i} = V_{\Phi _{i} \star }^{T}V_{\Phi _{i} \star }$, $\phantom {\dot {i}\!}\zeta _{i} = L_{\Phi _{i}}V_{\Phi _{i} \star }$, and

$$ \begin{aligned} &[\nabla F(L_{\Phi_{i}},E_{\Phi_{i}})]_{k,j} =\\& \frac{\dot{\Psi}\left(\omega_{\left(Y_{\Phi_{i}}\right)_{k,j}}-\left(X_{\Phi_{i}}\right)_{k,j}\right)-\dot{\Psi}\left(\omega_{\left(Y_{\Phi_{i}}\right)_{k,j}-1}-\left(X_{\Phi_{i}}\right)_{k,j}\right)}{\Psi\left(\omega_{\left(Y_{\Phi_{i}}\right)_{k,j}}-\left(X_{\Phi_{i}}\right)_{k,j}\right)-\Psi\left(\omega_{\left(Y_{\Phi_{i}}\right)_{k,j}-1}-\left(X_{\Phi_{i}}\right)_{k,j}\right)},\\& \forall k \in [m], j \in [q]. \end{aligned} $$

(29)

The step sizes in the (t+1)th iteration are selected as

$$ \tau_{C} = \frac{1}{\lambda_{1} \left\|V^{t}(V^{t})^{T}\right\|_{F}} = \frac{1}{\lambda_{1} \left\| \sum_{i=1}^{W} \iota_{\Phi_{i}} \right\|_{F}}, $$

(30)

$$ \begin{aligned} &\tau_{V_{\Phi_{i} \star}} = \frac{1}{e_{U}^{t} + \lambda_{1} \max_{i\in[W]} \digamma_{i}^{t}}, \end{aligned} $$

(31)

$$ \tau_{L_{\Phi_{i}}} = \frac{1}{\frac{1}{\sigma^{2}\beta^{2}}+\lambda_{2}}, $$

(32)

$$ \tau_{E_{\Phi_{i}}} = \sigma^{2}\beta^{2}, $$

(33)

and

$$ \tau_{U} = \frac{1}{\lambda_{2} \left\| \left(V^{t+1}\right)^{T}V^{t+1} \right\|_{F}} = \frac{1}{\lambda_{2} \left\| \sum_{i=1}^{W} \iota_{\Phi_{i}} \right\|_{F}}, $$

(34)

where e_U = λ₂∥(U^t)^TU^t∥_F, $\digamma _{i}^{t} =\! \left \|I_{q \times q}+\!\-\left (C_{\Phi _{i} \star }^{t+1}\right)\cdot \left (C_{\Phi _{i} \star }^{t+1}\right)^{T}\!-\left (C_{\Phi _{i}}\right)_{\Phi _{i} \star }^{t+1}\,-\, \left ((C_{\Phi _{i}})_{\Phi _{i} \star }^{t+1}\right)^{T}\right \|_{F}$. These step sizes are no greater than the reciprocals of the smallest Lipschitz constants of $\nabla _{C_{\Phi _{i}}} H$, $\nabla _{V_{\Phi _{i} \star }} H$, $\nabla _{L_{\Phi _{i}}} H$, $\nabla _{E_{\Phi _{i}}} H$, and ∇_UH in the tth iteration, respectively. Details of the calculations are shown in Appendix 6. This property is useful for the convergence analysis of the DSAPA.

The constraints in (22) are met by projecting the updated estimates to ${\mathcal {S}\mathcal {F}}_{i}$. For the constraints on $C_{\Phi _{i}}$, in steps 10–15 of Subroutine 1, we first set diagonal entries of $(C_{\Phi _{i}})_{\Phi _{i} \star }^{t+1}$ to zero. Then, we keep the d entries with the largest absolute value of $(C_{\Phi _{i}})_{\star j}^{t+1}$ and set all other entries to zero for any j∈[q]. The infinity norm on $L_{\Phi _{i}}$ is met by setting all entries larger than α₁ to be α₁ and setting all entries smaller than −α₁ to be −α₁ (step 4 in Subroutine 3). A similar approach applies to $E_{\Phi _{i}}$. We also keep $\frac {s}{W}$ entries with the largest absolute values and set other nonzero entries to zero (steps 3–6 in Subroutine 4).

Note that $L_{\Phi _{i}}$ and $E_{\Phi _{i}}$ can be updated by node i independently and are not shared with other nodes. Updating $C_{\Phi _{i}}$, $V_{\Phi _{i}}$, and U needs communication from other nodes due to the coupling in the objective function. $V_{\Phi _{i}}$ cannot be shared with other nodes, since otherwise other nodes can estimate $L_{\Phi _{i}}$ by multiplying U and $V_{\Phi _{i}}$. Thus, node i computes the intermediate terms that depend on $V_{\Phi _{i}}$ and send to other nodes instead of sending $V_{\Phi _{i}}$, as illustrated in Fig. 3.

The algorithm is initialized as follows. $L_{\Phi _{i}}^{0}$ in node i is defined as,

$$ (L^{0}_{\Phi_{i}})_{k,j}= \left \{ \begin{array}{rcl} \frac{\omega_{l} - \omega_{l-1}}{2} & \text{if} & (Y_{\Phi_{i}})_{k,j}=l, 0< l<K \\ \frac{\alpha_{1} - \omega_{K-1}}{2} & \text{if} & (Y_{\Phi_{i}})_{k,j}=K \\ \frac{\alpha_{1}-\omega_{1}}{2} & \text{if} & (Y_{\Phi_{i}})_{k,j}=0 \end{array} \right. $$

(35)

Then, node i performs the truncated singular value decomposition on $L^{0}_{\Phi _{i}}$ and let $U_{i}^{(r)}\Sigma _{i}^{(r)} (V_{\Phi _{i} \star }^{(r)})^{T}$ denote the rank-r approximation to $L^{0}_{\Phi _{i}}$. Then, node i transmits $U_{i}^{(r)}$ to all other nodes. Each node initializes at

$$ U^{0}=\frac{1}{W}\sum_{i=1}^{W}U_{i}^{(r)}\left(\Sigma_{i}^{(r)}\right)^{1/2}, $$

(36)

$$ V_{\Phi_{i} \star}^{0} = \left(L^{0}_{\Phi_{i}}\right)^{T}U^{0}\left(\left(U^{0}\right)^{T}U^{0}\right)^{-1}, \textrm{ and } $$

(37)

$$ E_{\Phi_{i}}^{0}=C_{\Phi_{i}}^{0}=0. $$

(38)

The convergence of DSAPA is summarized as follows.

Theorem 3

From any initial point, DSAPA always converges to a critical point of (23).

The computational complexities of Subroutines 1–5 are $\mathcal {O}(nqr)$, $\mathcal {O}(mqr)$, $\mathcal {O}(mq)$, $\mathcal {O}(mq)$, and $\mathcal {O}(mqr)$, respectively. The per-node per-iteration complexity of DSAPA is $\mathcal {O}(nqr)$. In contrast, the complexity of the centralized algorithm in [27] is $\mathcal {O}(nmr)$. The communication cost of Subroutines 1, 2, and 5 are $\mathcal {O}(n^{2})$, $\mathcal {O}(nWr)$, and $\mathcal {O}(mWr)$, respectively.

For data clustering, a central node collects $\hat {C}_{\Phi _{i}}$ from all the nodes and applies spectral clustering [41] to obtain the clustering results.

When λ₁ and λ₂ are large enough, (23) approximates (3), but the step sizes in (30)–(32) and (34) are small and that reduces the convergence rate. One practical solution is to dynamically increase λ₁ and λ₂ [55]. We suggest the following practical selection. Initialize with small λ₁ and λ₂, and replace λ₂ with ρλ₂ (ρ>1) for the first T₀ iterations. Then, reset λ₂ to the initial value and update them with ρλ₁ and ρλ₂ simultaneously in each iteration. The algorithm terminates after T iterations.

5 Results: numerical experiments

We evaluate the performance on the Irish smart meter dataset (ISMD) [56] and the UMass smart ^∗ microgrid dataset (USMD) [57]. The ISMD consists of more than 5000 residential customers. The measurements are obtained every 30 min and have a unit of kilowatt (kW). The UMSD contains 443 users in 24 h, and the power consumption is measured every minute. Some users have long sequences of zero power consumption, and some users have significantly high power consumption occasionally. We suspect these measurements have data quality issues resulting from devices or communication and remove these users from the datasets. We use 4780 customers in 30 days for ISMD and 438 customers in 6 h for USMD. Thus, the size of the data matrix L is 1440×4780 for ISND and 360×438 for USMD. The power consumption is at most 6 kW and 99 kW, respectively. Since the raw measurements are noisy, L is approximated by a rank-r matrix $L^{*}_{\textrm {rank-}r}$ by keeping only the largest r singular values. The recovery error is measured by $\|L^{*}_{\textrm {rank-}r}-\tilde {L}\|_{F}^{2}/\|L^{*}_{\textrm {rank-}r}\|_{F}^{2}$, where $\tilde {L}$ is the recovered matrix. We choose r to be about 10% of the total number of the singular values. Then, r is set to 150 for ISMD and 40 for USMD. The following experiments are tested on ISMD, if not otherwise specified.

As described in Section 2.3, normalized mutual information is used to measure the data privacy. We now calculate the average normalized mutual information of 4780 users $\hat {NI} = \frac {1}{4780}\sum _{i=1}^{4780}NI(L_{\star i}, Y_{\star i})$. As a comparison, we also calculate the normalized mutual information between the noisy data (before quantization) and the actual data. The quantization level K is chosen as 2 or 5. The quantization boundaries and quantized values are summarized in Table 2 (K=2,5). We place more boundaries in the region where data concentrate. Selecting the optimal quantized boundaries is beyond the scope of this paper and will be left for the future work. We believe these parameters can be optimized if a small portion of ground-truth data are available for training. The noise level σ varies from 0.1 to 0.4 with a step size of 0.02. To compute the probabilistic distribution of L_⋆i, we divide the range 0–6 kW into 100 or 300 equal intervals and compute the empirical distributions. As shown in Fig. 4, the normalized mutual information between the power after quantization and the actual power consumption is always smaller than that between the noisy value before quantization and the actual power consumption. This indicates the proposed quantization process enhances the data privacy. In addition, the normalized mutual information $\hat {NI}$ decreases when either K decreases or σ increases. That is consistent with the intuition.

Table 2 Quantization boundaries and quantized values

Full size table

Since no ground-truth clustering result exists for this dataset, we define an index CI to evaluate the clustering performance. Let a_j denote the maximum angle of all the data points in group j to the estimated subspace of this group. Let b_j denote the minimum angle of any point in group j to the other subspaces. The clustering index CI measures the clustering accuracy and is defined as

$$ CI = \frac{1}{N} \sum_{j=1}^{N}\frac{b_{j}-a_{j}}{\max\{a_{j},b_{j}\}}. $$

(39)

CI is large if a_j’s are small and b_j’s are large, which means that points in the same group are close to the subspace of that group and away from other groups. A larger CI corresponds to a better clustering result. We apply Sparse Subspace Clustering (SSC) [21] to this dataset with different cluster numbers and compare the resulting CI’s. We use the Alternating Direction Method of Multipliers (ADMM) [58] to solve SSC. When the number of clusters is p=4, we obtain the maximum CI=0.085. Thus, we set the number of clusters to be 4 in the following experiments.

We generate corruptions E^∗ and noise N randomly. The nonzero entries of E^∗ are selected from [−4,−0.5] and [0.5,4] uniformly. Every entry of N is drawn from the $\mathcal {N}(0,0.3^{2})$. The quantization level K is set to 5. The locations of the missing data are selected randomly. The simulations run in MATLAB on a computer with 3.4 GHz Intel Core i7.

We evaluate DSAPA on the quantized measurements. We choose W=5 agents. We assume the upper bound of the magnitudes of the sparse error and the power consumption are known. For simplicity, we use the largest value of the given error and set α₂=4. Similarly, we set α₁=6. We set d=50. λ₁ and λ₂ are initialized to be 0.5, and ρ=1.05. The maximum iteration number T is set to be 200. T₀ is set to be 40.

Here, d is selected to be approximately r/(p−1). We use p−1 considering the overlap between subspaces. We remark that varying d around the selected value does not affect the result. λ₁ and λ₂ are self-adjusted in our algorithm as discussed in the last paragraph of Section 4.

Figure 5 shows the energy consumption of a single user in 24 h. It compares the actual data, the rank-150 approximation of the actual data, the quantized observations, the recovered data by DSAPA, and the average quantized data of the users in the same group. One can see that the rank-150 approximation of the actual data has a similar pattern to the actual data. Clearly, the details of power consumption are hidden in the quantized measurements. For instance, the two peak consumptions are no longer visible in quantized measurements. Thus, an intruder does not know the user pattern if only accessing the quantized measurements of that user only. On the other hand, DSAPA recovers the power consumption trend accurately from the quantized data. The two peak loads are accurately identified in the recovered data as shown in Fig. 5. The recovered data can be used for grid planning.

After obtaining $\hat {C}$ using DSAPA, we implemented spectral clustering [41] to cluster the data points. To visualize the recovered consumption pattern of users in each group, we normalize the power consumptions and compute the average of users in the same group. Figure 6 shows the average profile obtained by our method in 1 day (no missing data and with 15% missing data). For comparison, the mean daily profile of the ground-truth data clustered by SSC is also shown in Fig. 6. One can see that the data losses do not affect the recovery performance of DSAPA. The recovered patterns are close to the actual patterns obtained by SSC, considering that the measurements are highly noisy and quantized. Now we pick some users in the same group and average the quantized value (K=5) of these users. We calculate the normalized mutual information between one user and the averaged quantized value of the selected users. Figure 7 shows the normalized mutual information when the number of selected users varies. The value does not decrease much when the number of users increases. Compared with Fig. 4b, one can see that the averaged quantized value of the same group does not provide much information to the single user.

We compare DSAPA with Approximate Projected Gradient Method (APGM) [28] and Quantized Robust Principal Component Analysis (QRPCA) [35] for data recovery in Fig. 8a. We apply SSC on the recovered data by APGM (or QRPCA) to obtain the clustering result, labeled by “APGM + SSC” (“QRPCA + SSC”) in Fig. 8b. If we simply use the quantized value Q₁,Q₂,⋯,Q₅ to estimate the actual power consumption, the relative recovery error is 0.869, which is much larger than the results in Fig. 8a. When the missing data rate changes from 0 to 0.4, our method always outperform the other methods both in data recovery and data clustering. For comparison, CI=0.085 for SSC on the ground-truth data, and CI=0.05 for a random clustering. Our method achieves CI=0.08 using quantized measurements with 5% corruptions and no data losses.

We vary the number of users by randomly selecting a subset of the 4780 users. Under the 15% missing rate and no corruption, Fig. 9 shows the recovery error when the number of users varies. The recovery error is 0.35 when the user number is to 500 and decreases to 0.2 when there are 2500 users.

We test the case when no additional noise is added before quantization. We vary the estimated noise level when implementing DSAPA since the measurements usually contain observation noise. As shown in Fig. 10, DSAPA can recover the data with no additional noise. However, adding no noise can lead to a low privacy level. The normalized mutual information when K=2 and K=5 are 0.2862 and 0.9579, respectively (0.02 kW per interval). These values are much higher than those shown in Fig. 4, indicating a lower level of privacy when no noise is added.

In Fig. 11, we compare the relative recovery error and the clustering index CI of DSAPA and the centralized algorithm Sparse-APA in [27]. Since Sparse-APA does not consider missing data, we study the case with full observations. The corruption rate is set as s/mn=5%. The recovery error of Sparse-APA is small than our method when the algorithm initializes, because Sparse-APA can compute a better initialization in a centralized fashion. However, the difference decreases as the iteration number increases. After 200 iterations, both algorithms perform similarly.

We next show the performance of DSAPA on USMD. Since the measurements vary from 0 to 100 kW, we set K=7, and α₁=50. The quantization boundaries and quantized values are in Table 2 (K=7). p and d are set to be 4 and 15, respectively, using the same technique as discussed in the previous experiments. We generate the corruptions E^∗ and the noise N randomly. The nonzero entries of E^∗ are selected from [−10,10] uniformly, and the corruption rate is 5%. Every entry of N is drawn from the $\mathcal {N}(0,0.3^{2})$. Similar to Figs. 5 and 8a, we show the results on USMD in Fig. 12.

6 Conclusion and discussions

This paper for the first time shows that the two seemingly contradicting objectives of data privacy and information accuracy of smart meter data can be achieved simultaneously. The central technical contribution is the development of a decentralized data recovery and clustering method from highly quantized, partially lost, and partially corrupted measurements. Distributed nodes do not share raw data with each other and cannot estimate the actual data of other nodes. We propose a Distributed Sparse Alternative Proximal Algorithm (DSAPA) with a convergence guarantee to solve the nonconvex problem. The recovery error of our method is nearly optimal. The method is evaluated on actual smart meter datasets. Future works include leveraging the time correlation within each user to further improve the method and developing unsynchronized decentralized data recovery algorithms.

7 Appendix 1

7.1 Supporting lemmas used in the Proof of Theorem 1

Lemma 1

Under Assumptions 1 and 2, the following inequalities hold

$$ \begin{aligned} \left\|\hat{L}_{i}-L^{*}_{i}\right\|_{F} \le \sqrt{2gd}a\left\|\left(\hat{L}_{i}-L^{*}_{i}\right)_{\Omega_{i}}\right\|_{F} + 2\sqrt{2gd}\alpha_{1}b, \end{aligned} $$

(40)

$$ \begin{aligned} \left\|\hat{L}-L^{*}\right\|_{F} \le \sqrt{2gd}a\left\|\left(\hat{L}-L^{*}\right)_{\Omega}\right\|_{F} + 2\sqrt{2gdp}\alpha_{1} b. \end{aligned} $$

(41)

where $a = \frac {\sqrt {\xi mn}}{h\sqrt {p}}$, $b = \frac {\sqrt {\xi gdmn}\mathcal {C}}{\sqrt {hp}}$. Ω_i includes the indices of the observed entries.

Proof

From Assumptions 1 and 2,

$$ \begin{aligned} &\left\|\hat{L}_{i}-L^{*}_{i}\right\|_{F} \stackrel{(\mathrm{a})} \le \sqrt{2gd}\frac{\sqrt{\xi mn}}{\sigma_{1}(G_{i})\sqrt{p}}\left\|\left(\hat{L}_{i}-L^{*}_{i}\right)_{\Omega_{i}}\right\|_{F} \\&+ 2\sqrt{2gd}\alpha_{1}\frac{\sqrt{g\xi dmn}\sigma_{2}(G_{i})}{\sigma_{1}(G_{i})\sqrt{p}}\\ &\stackrel{(\mathrm{b})} \le \sqrt{2gd}\frac{\sqrt{\xi mn}}{h\sqrt{p}}\left\|\left(\hat{L}_{i}-L^{*}_{i}\right)_{\Omega_{i}}\right\|_{F} \\& + 2\sqrt{2gd}\alpha_{1}\frac{\sqrt{g\xi dmn}\mathcal{C}}{\sqrt{hp}} \end{aligned} $$

(42)

where (a) holds from Lemma 8 and Lemma 9 in [28], and the assumption n_i≤ξn/p. (b) holds because of σ₁(G_i)≥h and $\sigma _{2}(G_{i}) \le \mathcal {C} \sqrt {h}$.

Then

$$ \begin{aligned} & \left\|\hat{L}-L^{*}\right\|_{F}^{2} = \sum_{i=1}^{p}\left\|\hat{L}_{i}-L^{*}_{i}\right\|_{F}^{2} \\ &\stackrel{(\mathrm{c})}\le \sum_{i=1}^{p}\left(2gda^{2}\left\|\left(\hat{L}_{i}-L^{*}_{i}\right)_{\Omega_{i}}\right\|_{F}^{2}\right.\\&\left.{\vphantom{\sum_{i=1}^{p}}}+8gd\alpha_{1} ab\|(\hat{L}_{i}-L^{*}_{i})_{\Omega_{i}}\|_{F} + 8gd{\alpha_{1}}^{2}b^{2}\right) \\ &\stackrel{(\mathrm{d})} \le 2gda^{2}\left\|\left(\hat{L}-{L}^{*}\right)_{\Omega}\right\|_{F}^{2}+8gd\alpha_{1} ab \sqrt{p}\left\|\left(\hat{L}-{L}^{*}\right)_{\Omega}\right\|_{F} \\&+ 8gd{\alpha_{1}}^{2}b^{2}p\\ & = \left(\sqrt{2gd}a\left\|\left(\hat{L}-L^{*}\right)_{\Omega}\right\|_{F} + 2\sqrt{2gdp}\alpha_{1} b\right)^{2}. \end{aligned} $$

(43)

where (c) follows from (40) (or (42)). (d) holds from $\sum _{i=1}^{p}\left \|\left (\hat {L}_{i}-L^{*}_{i}\right)_{\Omega _{i}}\right \|_{F} \le \sqrt {p}\left \|\left (\hat {L}-{L}^{*}\right)_{\Omega }\right \|_{F}$. Then, we have the desired result. □

Lemma 2

Let $\hat {\theta }=\text {vec}(\hat {X})$, θ^∗=vec(X^∗), $\mathcal {F}(\theta ^{*}) = F(\theta ^{*})$, and $\hat {X}$, $X^{*} \in \mathcal {S}_{fX}$. Follow the same assumptions as those of Theorem 1. Then, with probability at least $\phantom {\dot {i}\!}1-pC_{1}e^{-C_{2}\xi n/p}$,

$$ \begin{aligned} &\left|\left \langle \nabla_{\theta}\mathcal{F}(\theta^{*}),\hat{\theta}-\theta^{*} \right \rangle\right| \\&\le 4.02L_{\alpha}gda\sqrt{\xi n}\left\|\left(\hat{X}-X^{*}\right)_{\Omega}\right\|_{F}\\&+ 8.04L_{\alpha}gda\sqrt{\xi n}\alpha_{2} \sqrt{s} + 8.04L_{\alpha} gd\sqrt{\xi np}\alpha_{1} b \\& + 2\alpha_{2} s L_{\alpha}, \end{aligned} $$

(44)

holds for the positive constants C₁ and C₂. 〈.,.〉 denotes the inner product of two matrices, i.e., the sum of entry-wise products.

Proof

The proof is generalized from the proof of Lemma 2 in [27] which does not consider missing data. Here we extend the analysis to handle missing data. According to the definition, there exists a permutation matrix Γ^∗ such that L^∗ can be written as $L^{*}=\left [L^{*}_{1},L^{*}_{2},...,L^{*}_{p}\right ]\Gamma ^{*}$. By Assumption 2, $\hat {L}$ can be written as $\hat {L}=\left [\hat {L}_{1},\hat {L}_{2},...,\hat {L}_{p}\right ]\Gamma ^{*}$, where the dimension of $\hat {L}_{i}$ is smaller or equal to (g−1)d.

Note that $\sum _{l=1}^{K}\varphi _{l}\left (\left (X^{*}_{i}\right)_{k,j}\right)=1$ and $\left [L^{-1}_{\alpha }\-\nabla _{X}F\left (X^{*}_{i}\right)\-\right ]_{k,j} \-= -L^{-1}_{\alpha }\-\sum _{l=1}^{K}\-\frac {\dot {\varphi }_{l}\left (\left (X^{*}_{i}\right)_{k,j}\right)}{\varphi _{l}\left (\left (X^{*}_{i}\right)_{k,j}\right)}\-\cdot \-\boldsymbol {1}_{[(Y_{i})_{k,j}=l]}$. Combining them with (7), one can conclude that the elements of $L^{-1}_{\alpha }\nabla _{X}F\left (X^{*}_{i}\right)$ have zero mean, and the variances are bounded by one. Using the result of Lemma 1 in [27], we have

$$ \left\|L^{-1}_{\alpha}\nabla_{X}F\left(X^{*}_{i}\right)\right\|_{2} \le 2.01\sqrt{\xi n/p} $$

(45)

holds with probability at least $\phantom {\dot {i}\!}1-C_{1}e^{-C_{2}\xi n/p}$. $X^{*}_{i}$ is the same ith group as $L^{*}_{i}$ under the permutation Γ^∗. Then

$$\begin{aligned} & \left|\left \langle \nabla_{\theta}\mathcal{F}(\theta^{*}),\hat{\theta}-\theta^{*} \right \rangle\right| = \left|\left \langle \nabla_{X}F(X^{*}),\hat{X}-X^{*} \right \rangle\right| \\ &\le \left|\left \langle \nabla_{X}F(X^{*}),\hat{L}-L^{*} \right \rangle\right|+\left|\left \langle \nabla_{X}F(X^{*}),\hat{E}-E^{*} \right \rangle\right| \\ &\stackrel{(\mathrm{a})}= \left|\sum_{i=1}^{p}\left \langle \nabla_{X}F(X^{*}_{i}),\hat{L}_{i}-L^{*}_{i} \right \rangle\right| \\&~~~~+\left|\left \langle \nabla_{X}F(X^{*}),\hat{E}-E^{*} \right \rangle\right| \\ &\stackrel{(\mathrm{b})} \le \sum_{i=1}^{p} \left\|\nabla_{X}F\left(X^{*}_{i}\right)\right\|_{2}\left\|\hat{L}_{i}-L^{*}_{i}\right\|_{*} + 2\alpha_{2} s L_{\alpha}\\ &\stackrel{(\mathrm{c})} \le 2.01L_{\alpha}\sqrt{\xi n/p} \sum_{i=1}^{p}\sqrt{2gd}\left\|\hat{L}_{i}-L^{*}_{i}\right\|_{F} + 2\alpha_{2} s L_{\alpha}\\ &\stackrel{(\mathrm{d})} \le 2.01L_{\alpha}\sqrt{\xi n/p} \sum_{i=1}^{p}(\sqrt{2gd}(\sqrt{2gd}a)\left\|\left(\hat{L}_{i}-L^{*}_{i}\right)_{\Omega_{i}}\right\|_{F} \\&~~~~+ \sqrt{2gd}(2\sqrt{2gd}\alpha_{1}b)) + 2\alpha_{2} s L_{\alpha}\\ &\stackrel{(\mathrm{e})} \le 2.01L_{\alpha}(\sqrt{2gd}a)\sqrt{2\xi gdn}\left\|\left(\hat{L}-L^{*}\right)_{\Omega}\right\|_{F} \\&~~~~+ 8.04L_{\alpha} gd\sqrt{\xi np}\alpha_{1} b +2\alpha_{2} s L_{\alpha}\\ & \le 4.02L_{\alpha}(\sqrt{gd}a)\sqrt{\xi gdn}\left\|\left(\hat{X}-X^{*}\right)_{\Omega}\right\|_{F}\\&~~~~+4.02L_{\alpha}(\sqrt{gd}a)\sqrt{\xi gdn}\left\|\left(\hat{E}-E^{*}\right)_{\Omega}\right\|_{F}\\&~~~~+ 8.04L_{\alpha} gd\sqrt{\xi np}\alpha_{1} b + 2\alpha_{2} s L_{\alpha}\\ &\stackrel{(\mathrm{f})} \le 4.02L_{\alpha}gda\sqrt{\xi n}\left\|\left(\hat{X}-X^{*}\right)_{\Omega}\right\|_{F}\\&~~~~+ 8.04L_{\alpha}gda\sqrt{\xi n}\alpha_{2} \sqrt{s}\\&~~~~+ 8.04L_{\alpha} gd\sqrt{\xi np}\alpha_{1} b + 2\alpha_{2} s L_{\alpha} \end{aligned} $$

holds with probability at least $\phantom {\dot {i}\!}1-pC_{1}e^{-C_{2}\xi n/p}$.

(a) holds from the linearity of the inner product. The first term of (b) holds from |〈A,B〉|≤∥A∥₂∥B∥_∗. The second term of (b) holds from the fact that both $\hat {E}, E^{*}$ have at most s nonzero entries and |∇_XF(X^∗)_i,j|≤1. (c) holds from (45) and the fact $\left \|\hat {L}_{i}-L^{*}_{i}\right \|_{*} \le \sqrt {2gd}\left \|\hat {L}_{i}-L^{*}_{i}\right \|_{F}$. (d) holds from Lemma (40). (e) holds from $\sum _{i=1}^{p}\left \|\left (\hat {L}_{i}-L^{*}_{i}\right)_{\Omega _{i}}\right \|_{F} \le \sqrt {p}\|(\hat {L}-{L}^{*})_{\Omega }\|_{F}$. (f) holds because $\|(\hat {E}-E^{*})_{\Omega }\|_{F} \le 2\alpha _{2}\sqrt {s}$, which results from the fact that $|\hat {E}_{i,j}-E^{*}_{i,j}|$ is bounded by 2α₂. The probability $\phantom {\dot {i}\!}1-pC_{1}e^{-C_{2}\xi n/p}$ comes from the union bound for $P(\max _{i \in [p]}\|\nabla _{X}F(X^{*}_{i})\|_{2} \le 2.01L_{\alpha }\sqrt {\xi n/p})$. □

8 Appendix 2

8.1 Proof of Theorem 1

Proof

The proof follows and extends the proofs of Theorem 1 in [28] and Theorem 5 in [37]. We extend from the low-rank matrices in [28,37] to matrices with columns in p low-dimensional subspaces. Moreover, ref. [28] does not consider corruptions, and ref. [37] does not consider missing data. Here we consider both missing data and corruptions.

The first bound $2\alpha _{1}+2\alpha _{2}\sqrt {\frac {s}{mn}}$ in (8) follows from the fact that $\hat {L}$, $L^{*},\hat {E}$, $E^{*} \in \mathcal {S}_{f}$. We discuss the second bound in (8) as follows. We denote (4) to be F(X) when we treat X to be the variable. Note that $\mathcal {S}_{fX}$ is a compact set, and the objective function is continuous in X. F(X) then achieves a minimum in $\mathcal {S}_{fX}$. Suppose that $\hat {X} \in \mathcal {S}_{fX}$ minimizes F(X).

Let $\theta =\text {vec} (X) \in \mathbb {R}^{mn}$ and $\mathcal {F}_{\Omega,Y}(\theta)=F(X)$. By the second-order Taylor’s theorem, we have

$$ \begin{aligned} & \mathcal{F}_{\Omega,Y}(\theta) = \mathcal{F}_{\Omega,Y}(\theta^{*}) + \left \langle \nabla_{\theta}\mathcal{F}_{\Omega,Y}(\theta^{*}),\theta-\theta^{*} \right \rangle\\ & + \frac{1}{2}\left \langle \theta-\theta^{*}, (\nabla^{2}_{\theta\theta}\mathcal{F}_{\Omega,Y}(\tilde{\theta}))(\theta-\theta^{*}) \right \rangle, \end{aligned} $$

(46)

where $\tilde {\theta }=\theta ^{*}+\bar {\eta }(\theta -\theta ^{*})$ for some $\bar {\eta }\in [0,1]$, with corresponding matrices $\tilde {X}=X^{*}+\bar {\eta }(X-X^{*})$.

From (46), Lemma 2, and Lemma A.3 in [38], we have

$$ \begin{aligned} & 0 \geq F(\hat{X}) - F(X^{*}) \\ & \geq - c_{f}\|(\hat{X}-X^{*})_{\Omega}\|_{F}+ \frac{\gamma_{\alpha}}{2}\|(\hat{X}-X^{*})_{\Omega}\|_{F}^{2}- \eta. \end{aligned} $$

(47)

holds with probability at least $\phantom {\dot {i}\!}1-pC_{1}e^{-C_{2}\xi n/p}$ where $c_{f} = 4.02L_{\alpha }gda\sqrt {\xi n}$, $\eta = 8.04L_{\alpha }gda\sqrt {\xi n}\alpha _{2} \sqrt {s}+ 8.04L_{\alpha } gd\sqrt {\xi np}\alpha _{1} b + 2\alpha _{2} s L_{\alpha } $.

By solving (47), we then have

$$ \begin{aligned} \|(\hat{X}-X^{*})_{\Omega}\|_{F} \le (c_{f}+\sqrt{c_{f}^{2}+2\gamma_{\alpha}\eta})/\gamma_{\alpha}. \end{aligned} $$

(48)

Thus,

$$ \begin{aligned} & \|\hat{L}-L^{*}\|_{F}/\sqrt{mn} \\& \stackrel{(\mathrm{a})}{\leq} (\sqrt{2gd}a\|(\hat{L}-L^{*})_{\Omega}\|_{F} + 2\sqrt{2gdp}\alpha_{1} b)/\sqrt{mn}\\ & \le (\sqrt{2gd}a)(\|(\hat{X}-X^{*})_{\Omega}\|_{F}+\|(\hat{E}-E^{*})_{\Omega}\|_{F})/\sqrt{mn} \\&~~~~ +2\sqrt{2gdp}\alpha_{1} b/\sqrt{mn}\\ & \stackrel{(\mathrm{b})}{\leq} (\sqrt{2gd}a)((c_{f}+\sqrt{c_{f}^{2}+2\gamma_{\alpha}\eta})/\gamma_{\alpha} +2 \alpha_{2} \sqrt{s})/\sqrt{mn} \\&~~~~+2\sqrt{2gdp}\alpha_{1} b/\sqrt{mn}\\ & \stackrel{(\mathrm{c})}{\leq} \frac{M_{1} d^{\frac{3}{2}}nm^{\frac{1}{2}}}{h^{2}p} + \frac{M_{2} d^{\frac{5}{4}}n^{\frac{1}{2}}m^{\frac{1}{4}}}{ h^{\frac{5}{4}}p^{\frac{1}{2}}} + \frac{M_{3} d n^{\frac{1}{2}} m^{\frac{1}{4}} s^{\frac{1}{4}}}{ h^{\frac{3}{2}}p^{\frac{3}{4}}} \\ &~~~~ +\frac{M_{4} d^{\frac{1}{2}} s^{\frac{1}{2}}}{hp^{\frac{1}{2}}} + \frac{M_{5} d }{h^{\frac{1}{2}}}\\ & \stackrel{(\mathrm{d})}{\leq} C_{1}' \frac{\kappa d\sqrt{d}}{f^{2} \sqrt{m}} + C_{2}' \frac{d\kappa^{3/4}}{f^{3/2}m^{1/4}}\left(\frac{s}{mn}\right)^{1/4}\\&~~~~ + C_{3}' \frac{\sqrt{\kappa d}}{f}\left(\frac{s}{mn}\right)^{1/2}, \end{aligned} $$

(49)

where M₁– M₅ are constants. (a) holds because of (41). (b) holds according to (48). (c) holds because of the Cauchy-Schwarz inequality. (d) holds because f=h/m, $\frac {M_{2}d^{\frac {5}{4}}n^{\frac {1}{2}}m^{\frac {1}{4}}}{h^{\frac {5}{4}}p^{\frac {1}{2}}} =\frac {M_{2}\kappa d^{\frac {5}{4}}}{f^{\frac {5}{4}}m^{\frac {1}{2}}}$, and $\frac {M_{5} d }{h^{\frac {1}{2}}}=\frac {M_{5} d }{f^{\frac {1}{2}}m^{\frac {1}{2}}}$. The order of both terms are smaller than $O\left (\frac {\kappa d\sqrt {d}}{f^{2} \sqrt {m}}\right)$. □

9 Appendix 3

9.1 Supporting lemmas for Theorem 2

Lemma 3

There exists a set $\mathcal {X}\subset \mathcal {S}_{fX}$ with

$$ |\mathcal{X}| \ge \exp(\frac{dn - d \lfloor \frac{s}{m} \rfloor}{16}) $$

(50)

such that the following properties hold for any γ∈(0,1]:

1. For all $X\in \mathcal {X}$, X_i,j=±αγ or 0, ∀(i,j), where α= min(α₁,α₂).

2. For all X⁽ⁱ⁾, $X^{(j)}\in \mathcal {X}$, i≠j,

$$ \|X^{(i)}-X^{(j)}\|_{F}^{2} > \alpha^{2}\gamma^{2}\left(\frac{mn}{2}-s\right). $$

(51)

Proof

Now we independently generate a set $\mathcal {X}$ of $\left \lceil \exp \left (\frac {dn - d \lfloor \frac {s}{m} \rfloor }{16}\right) \right \rceil $ random matrices from the following distribution. According to columns’ indices, X is first been divided into X₁,X₂,⋯,X_p, which correspond to indices $\{1,...,\lfloor \frac {n}{p} \rfloor \}, \{\lfloor \frac {n}{p} \rfloor +1,...,2\lfloor \frac {n}{p} \rfloor \}, \{2\lfloor \frac {n}{p} \rfloor +1,...,3\lfloor \frac {n}{p} \rfloor \},\cdots, \-\{(p-1) \- \lfloor \frac {n}{p} \rfloor \- +1,...,n\}$, respectively. For the first d rows of X₁, fix the locations of $\lfloor \frac {s}{pm} \rfloor $ entries in each row and set the values to zero. The remaining $d\lfloor \frac {n}{p} \rfloor -d\lfloor \frac {s}{pm} \rfloor $ entries take values ±αγ with equal probabilities. For all i∈{d+1,...,m}, $j \in [\lfloor \frac {n}{p} \rfloor ]$,

$$ X_{i,j} := X_{k,j}, \text{ where} k=i(\text{mod} d)+1. $$

(52)

The same process is applied to X₂,X₃,⋯,X_p. Then, one can see that X can be written as X=L+E, where L can span subspaces with dimension smaller or equal to d, and E is a sparse matrix. We further have

$$ \|L\|_{\infty} = \alpha\gamma \le \alpha_{1}, ~~ \|E\|_{\infty} = \alpha\gamma \le \alpha_{2}, \text{ and} \|E\|_{0}\le s. $$

(53)

Each column of L can be represented by at most d other columns. Thus, $\mathcal {X}\in \mathcal {S}_{fX}$.

Note that the locations of the zero entries are the same for all matrices drawn from the above distribution. Consider two different matrices X and $\hat {X}$ drawn as above, we have

$$ \begin{aligned} &\|X -\hat{X} \|_{F}^{2} = \sum_{i,j}(X_{ij}-\hat{X}_{ij})^{2}\\ & \geq \lfloor\frac{m}{d}\rfloor \sum_{i=1}^{d} \left(\sum_{j=1}^{\lfloor \frac{n}{p} \rfloor}(X_{ij}-\hat{X}_{ij})^{2}+ \sum_{j=\lfloor \frac{n}{p} \rfloor+1}^{2\lfloor\frac{n}{p}\rfloor}(X_{ij}-\hat{X}_{ij})^{2} \right.\\&+\left. \cdots + \sum_{j=(p-1)\lfloor\frac{n}{p}\rfloor+1}^{n}(X_{ij}-\hat{X}_{ij})^{2} \right)\\& \geq 4\alpha^{2}\gamma^{2} \lfloor\frac{m}{d}\rfloor \sum_{i=1}^{dn-d\lfloor \frac{s}{m} \rfloor}\delta_{i}, \end{aligned} $$

(54)

where δ_i’s are independent 0/1 Bernoulli random variables and the means are all $\frac {1}{2}$. Following the same proof technique of Lemma 4 in [37], one can show that $\mathcal {X}$ satisfies the property 2. □

Let Y=X+N, where the entries in matrix N are i.i.d. and generated from Gaussian distribution $\mathcal {N}(0,\sigma ^{2})$. Suppose that $X \in \mathcal {X}$ is chosen uniformly at random. Lemma 4 bounds the mutual information I(X_Ω,Y_Ω).

Lemma 4

$$ I(X_{\Omega}, Y_{\Omega}) \le \frac{|\Omega|-s_{\Omega}}{2}\log\left(1+\left(\frac{\alpha\gamma}{\sigma}\right)^{2}\right) $$

(55)

Proof

The proof is similar to the proof of Lemma 5 in [31], but [31] does not consider corruptions. We modify the proof to handle corruptions. From Lemma 5 in [31], one can obtain

$$ I(X_{\Omega},Y_{\Omega}) \le H(\tilde{X}_{\Omega}+N_{\Omega}) - H(N_{\Omega}). $$

(56)

where ℵ denotes a matrix with all entries are i.i.d. generated from {+1,−1}. $\tilde {X}=X\cdot \aleph $ denotes the entry-wise product of X and ℵ.

The vectorization of $\tilde {X}_{\Omega }+N_{\Omega }$ is denoted by $\text {vec}(\tilde {X}_{\Omega }+N_{\Omega }) \in \mathbb {R}^{|\Omega |}$. We compute the covariance matrix as

$$ \Sigma := \mathbb{E}[\text{vec}(\tilde{X}_{\Omega}+N_{\Omega})\text{vec}(\tilde{X}_{\Omega}+N_{\Omega})^{T}]. $$

(57)

Then, by Theorem 8.6.5 in [59], we have

$$ \begin{aligned} & H(\tilde{X}_{\Omega}+N_{\Omega}) \le \frac{1}{2}\log((2\pi e)^{|\Omega|}\mathrm det(\Sigma))\\ & = \frac{1}{2}\log((2\pi e)^{|\Omega|}(\alpha^{2}\gamma^{2}+\sigma^{2})^{|\Omega|-s_{\Omega}}\sigma^{2s_{\Omega}}), \end{aligned} $$

(58)

The equality holds since $\tilde {X}$ has s_Ω zero entries.

We have $H(N_{\Omega }) = \frac {1}{2}\log ((2\pi e)^{|\Omega |}\sigma ^{2|\Omega |})$ and thus

$$ I(X_{\Omega},Y_{\Omega}) \le \frac{1}{2}\log\left(\frac{(\alpha^{2}\gamma^{2}+\sigma^{2})^{|\Omega|-s_{\Omega}}\sigma^{2s_{\Omega}}}{\sigma^{2|\Omega|}}\right), $$

(59)

which establishes the lemma. □

10 Appendix 4

10.1 Proof of Theorem 2

Proof

The proof follows Theorem 4 in [31] which does not consider the corruptions. Our proof is more involved due to the corruptions. Choose ε so that

$$ \epsilon^{2} = \min\{\frac{(1-2C_{0})\alpha^{2}}{8}, C_{4}^{2}\sigma^{2}\frac{dn-d\lfloor \frac{s}{m} \rfloor-64}{|\Omega|-s_{\Omega}}\} $$

(60)

where C₄ is a constant to be determined later. The set $\mathcal {X}$ is defined in Lemma 3. γ is set to be

$$ \frac{2\epsilon}{\alpha}\sqrt{\frac{2mn}{mn-2s}} \le \gamma \le \frac{2\epsilon}{\alpha}\sqrt{\frac{2}{1-2C_{0}}} \le 1. $$

(61)

Suppose for the sake of a contradiction that there exists an efficient algorithm such that for any $X \in \mathcal {S}_{fX}$, given the measurements Y, returns an $\hat {X}$, and

$$ \|X-\hat{X}\|^{2}_{F}/mn \le \epsilon^{2} $$

(62)

holds with probability at least 1/4. Let

$$ X^{*} = \arg\min_{X'\in\mathcal{X}}\|X'-\hat{X}\|^{2}_{F}. $$

(63)

Following the proof of Theorem 4 in [31], one can find that if (62) holds, then X^∗=X. By the assumption of (62),

$$ P(X \neq X^{*}) \le 3/4. $$

(64)

Let X be a matrix chosen uniformly at random from $\mathcal {X}$. Considering running the algorithm on X, then by Fano’s inequality, the probability that X≠X^∗ is at least

$$ \begin{aligned} & P(X \neq X^{*}) \ge \frac{H(X|Y_{\Omega})-1}{\log|\mathcal{X}|}\\ & = \frac{H(X)-I(X,Y_{\Omega})-1}{\log|\mathcal{X}|} \ge 1 - \frac{I(X_{\Omega}, Y_{\Omega})+1}{\log|\mathcal{X}|}. \end{aligned} $$

(65)

We have obtained $|\mathcal {X}|$ from Lemma 3 and I(X_Ω,Y_Ω) from Lemma 4. Then, using the inequality log(1+z)≤z, we obtain

$$ P(X \neq \hat{X}) \ge 1- \frac{16}{dn- d\lfloor \frac{s}{m} \rfloor} \left(\frac{|\Omega|-s_{\Omega}}{2}\left(\frac{\alpha\gamma}{\sigma}\right)^{2}+1\right). $$

(66)

Combining (66) with (61) and (64), we obtain

$$ \frac{16}{dn-d\lfloor \frac{s}{m} \rfloor} \left((|\Omega|-s_{\Omega})\frac{4}{1-2C_{0}}\left(\frac{\epsilon}{\sigma}\right)^{2}+1\right) \ge \frac{1}{4}, $$

(67)

which implies that

$$ \epsilon^{2} \ge \frac{(1-2C_{0})\sigma^{2}}{256} \frac{dn-d\lfloor \frac{s}{m} \rfloor-64}{|\Omega|-s_{\Omega}}. $$

(68)

Setting $C_{4}^{2} < \frac {1-2C_{0}}{256}$ leads to a contradiction, hence (62) must fail to hold with probability at least 3/4. Using the definition $f= \frac {|\Omega |}{mn}$, we obtain the desired result. □

11 Appendix 5

11.1 Proof of Proposition 1

Proof

Given any i, from (5), we know that $\hat {L}_{\star i}=\hat {L}\hat {C}_{\star i}$. Without loss of generality, we assume $\hat {L}_{\star i} \in \hat {S}_{1}$, where the $\hat {p}$ subspaces are denoted by $\hat {S}_{i}$ ($i\in [\hat {p}]$). Then, from the constraint $\hat {C}_{i,i}=0, \forall i \in [n]$, we have $\hat {L}_{\star i}=[\hat {L}_{1\backslash \star i} ~~\hat {L}_{-1}]\left [\begin {array}{c} \hat {C}_{\star i}^{(1\backslash \star i)} \\\hat {C}_{\star i}^{(-1)}\end {array}\right ]$, where $\hat {L}_{1\backslash \star i}$ denotes all data points belonging to $\hat {S}_{1}$ except $\hat {L}_{\star i}$. $\hat {L}_{-1}$ denotes all data points belonging to $\{\hat {S}_{j}\}_{j=2}^{\hat {p}}$. $\hat {C}_{\star i}^{(1\backslash \star i)}$ and $\hat {C}_{\star i}^{(-1)}$ are sparse coefficients corresponding to $\hat {L}_{1\backslash \star i}$ and $\hat {L}_{-1}$, respectively. Now we only need to prove that $\hat {C}_{\star i}^{(-1)}=0$.

If $\hat {C}_{\star i}^{(-1)}\neq 0$, then $\hat {L}_{\star i}$ belongs to a subspace $\hat {S}_{1}'$ which is different from $\hat {S}_{1}$, and spanned by data points corresponding to nonzero entries of $\left [\begin {array}{c} \hat {C}_{\star i}^{(1\backslash \star i)} \\\hat {C}_{\star i}^{(-1)}\end {array}\right ]$. Moreover, the dimension of $\hat {S}_{1}'$ must be smaller or equal to d since $\|\left [\begin {array}{c} \hat {C}_{\star i}^{(1\backslash \star i)} \\\hat {C}_{\star i}^{(-1)}\end {array}\right ]\|_{0} \le d$. Therefore, $\hat {L}_{\star i} \in \hat {S}_{1}''=\hat {S}_{1}' \bigcap \hat {S}_{1}$, where $\bigcap $ denotes the intersection of two subspaces. We first consider the case when the dimension of $\hat {S}_{1}''$ is smaller than d. Since the data points of $\hat {L}_{\star }$ are sampled from a continuous distribution of $\hat {p}$ subspaces, the probability that the data point $\hat {L}_{\star i}$ lying in a data-point-spanned hyperplane in $\hat {S}_{1}$ that has dimension smaller than d is 0 (to see this, consider the probability of a data point lying in a pre-fix line within a plane). Next we show that the number of such hyperplanes is finite. Because the data points are fixed beforehand, there is only a finite number of combinations of data points that can span $\hat {S}_{1}'$ and further intersect with $\hat {S}_{1}$ to form $\hat {S}_{1}''$. Then, the probability of the union of a finite of combinations is still zero. Therefore, the dimension of $\hat {S}_{1}''$ equals to d, which indicates that the dimensions of $\hat {S}_{1}'$ and $\hat {S}_{1}$ are both d. This leads to $\hat {S}_{1}''=\hat {S}_{1}'=\hat {S}_{1}$. This results in a contradiction, since the data points corresponding to $\hat {C}_{\star i}^{(-1)}\neq 0$ do not belong to $\hat {S}_{1}$. Thus, $\hat {C}_{\star i}^{(-1)}=0$, and the claim holds. □

12 Appendix 6

12.1 DSAPA: proof of the Lipschitz differential property and calculation of Lipschitz constants

A function is Lipschitz differentiable if and only if all its partial gradients are Lipschitz continuous. The definition is shown in Definition 3.

Definition 3

[60] For any fixed matrices z₁,z₂,..,z_n, matrix variable y, and a function y→Υ(y,z₁,z₂,...,z_n), the partial gradient ∇_yΥ(y,z₁,z₂,...,z_n) is said to be Lipschitz continuous with Lipschitz constant L_p(z₁,z₂,...,z_n), if the following holds

$$\begin{aligned} &\| \nabla_{y} \Upsilon(y,z_{1},z_{2},...,z_{n}) - \nabla_{y} \Upsilon(y',z_{1},z_{2},...,z_{n}) \|_{F} \\& \le L_{p}(z_{1},z_{2},...,z_{n}) \| y - y' \|_{F},~~ \forall y,y'. \end{aligned} $$

We provide the Lipschitz differential property of H and compute the corresponding Lipschitz constants of its partial gradients with respect to $C_{\Phi _{i}},V_{\Phi _{i} \star },L_{\Phi _{i}},E_{\Phi _{i}}$, ∀i∈[W]. Let $L^{t+1}_{p1}$, $L^{t+1}_{p2}$, $L^{t+1}_{p3}$, $L^{t+1}_{p4}$, and $L^{t+1}_{p5}$ denote the smallest Lipschitz constants of $\nabla _{C_{\Phi _{i}}} H$, $\nabla _{V_{\Phi _{i} \star }} H$, $\nabla _{L_{\Phi _{i}}} H$, $\nabla _{E_{\Phi _{i}}} H$, and ∇_UH in the (t+1)th iteration. We have

$$ \begin{aligned} &\| \nabla_{C_{\Phi_{i}}} H(C_{\Phi_{i}}) - \nabla_{C_{\Phi_{i}}} H(C_{\Phi_{i}}') \|_{F} \\ & = \| \lambda_{1} V^{t}(V^{t})^{T}(C_{\Phi_{i}} - C_{\Phi_{i}}')\|_{F} \\& \le \| \lambda_{1} V^{t}(V^{t})^{T}\|_{F}\|C_{\Phi_{i}} - C_{\Phi_{i}}' \|_{F} \\& = \| \lambda_{1} \sum_{i=1}^{W} \iota_{i}^{t}\|_{F}\|C_{\Phi_{i}} - C_{\Phi_{i}}' \|_{F} \\& \stackrel{(\mathrm{a})} = \frac{1}{\tau_{C}(V^{t})} \| C_{\Phi_{i}} - C_{\Phi_{i}}' \|_{F}, \end{aligned} $$

(69)

where (a) follows from (30). Equation (69) implies that

$$ \begin{aligned} L^{t+1}_{p1} \leq \| \lambda_{1} \sum_{i=1}^{W} \iota_{i}^{t}\|_{F}, \textrm{ and } \tau_{C}(V^{t}) \le 1/L^{t+1}_{p1}. \end{aligned} $$

(70)

$$ \begin{aligned} &\| \nabla_{V_{\Phi_{i} \star}} H(V_{\Phi_{i} \star}) - \nabla_{V_{\Phi_{i} \star}} H(V_{\Phi_{i} \star}') \|_{F} \\ & = \|\lambda_{2} (V_{\Phi_{i} \star}-V_{\Phi_{i} \star}')(U^{t})^{T}U^{t} + \lambda_{1}(V_{\Phi_{i} \star}-V_{\Phi_{i} \star}')\cdot \\& (I_{q \times q}-(C_{\Phi_{i}})_{\Phi_{i} \star}^{t+1}-((C_{\Phi_{i}})_{\Phi_{i} \star}^{t+1})^{T}+(C_{\Phi_{i} \star}^{t+1})(C_{\Phi_{i} \star}^{t+1})^{T})\|_{F} \\&\stackrel{(\mathrm{b})} \le \|V_{\Phi_{i} \star}-V_{\Phi_{i} \star}'\|_{F}\cdot(\|\lambda_{2} (U^{t})^{T}U^{t}\|_{F}+\lambda_{1}\cdot\\&\| I_{q \times q}+(C_{\Phi_{i} \star}^{t+1})(C_{\Phi_{i} \star}^{t+1})^{T}-(C_{\Phi_{i}})_{\Phi_{i} \star}^{t+1}-((C_{\Phi_{i}})_{\Phi_{i} \star}^{t+1})^{T}\|_{F}) \\& \stackrel{(\mathrm{c})} \le \frac{1}{\tau_{V}(U^{t},C^{t+1})} \| V_{\Phi_{i} \star}-V_{\Phi_{i} \star}'\|_{F}, \end{aligned} $$

(71)

where (b) follows from the triangle inequality, and (c) follows from (31). Equation (71) implies that

$$ \begin{aligned} &L^{t+1}_{p2} \leq \max_{i\in[W]} \lambda_{1}\| I_{q \times q}+(C_{\Phi_{i} \star}^{t+1})(C_{\Phi_{i} \star}^{t+1})^{T}-(C_{\Phi_{i}})_{\Phi_{i} \star}^{t+1}-\\& ((C_{\Phi_{i}})_{\Phi_{i} \star}^{t+1})^{T}\|_{F} + e_{U}^{t}, \textrm{ and } \tau_{V}(U^{t},C^{t+1}) \le 1/L^{t+1}_{p2}. \end{aligned} $$

(72)

$$ \begin{aligned} &\| \nabla_{L_{\Phi_{i}}} H(L_{\Phi_{i}}) - \nabla_{L_{\Phi_{i}}} H(L_{\Phi_{i}}') \|_{F} = \\ & \| \nabla F(L_{\Phi_{i}},E_{\Phi_{i}}^{t}) - \nabla F(L_{\Phi_{i}}',E_{\Phi_{i}}^{t}) + \lambda_{2} (L_{\Phi_{i}}-L_{\Phi_{i}}') \|_{F} \\& \stackrel{(\mathrm{d})} = \| \text{diag}(\nabla^{2} F(\bar{L}_{\Phi_{i}})) \text{vec}(L_{\Phi_{i}}-L_{\Phi_{i}}')\|_{2} \\&~~~~+ \lambda_{2} \|L_{\Phi_{i}}-L_{\Phi_{i}}'\|_{F}\\& \le (\| \text{diag}(\nabla^{2} F(\bar{L}_{\Phi_{i}}))\|_{2} + \lambda_{2})\|L_{\Phi_{i}}-L_{\Phi_{i}}'\|_{F} \\& \stackrel{(\mathrm{e})} = (\|\nabla^{2} F(\bar{L}_{\Phi_{i}})\|_{\infty} +\lambda_{2})\|L_{\Phi_{i}}-L_{\Phi_{i}}'\|_{F} \\& \stackrel{(\mathrm{f})} \le (\frac{1}{\sigma^{2} \beta^{2}}+\lambda_{2})\|L_{\Phi_{i}}-L_{\Phi_{i}}'\|_{F} \\& \stackrel{(\mathrm{g})}= \frac{1}{\tau_{L}(E_{\Phi_{i}}^{t})} \| L_{\Phi_{i}}-L_{\Phi_{i}}' \|_{F}, \end{aligned} $$

(73)

where (d) comes from the differential mean value theorem. $\nabla ^{2} F(\bar {L}_{\Phi _{i}}) \in \mathbb {R}^{m\times q}$ has the (k,j)th entry equaling to ${\frac {\partial ^{2} F}{\partial ^{2} (L_{\Phi _{i}})_{k,j}}|}_{(\bar {L}_{\Phi _{i}})_{k,j}}$, and $\text {diag}(\nabla ^{2} F(\bar {L}_{\Phi _{i}})) \in \mathbb {R}^{mq\times mq}$ is a diagonal matrix with the diagonal vector equaling to $\text {vec}(\nabla ^{2} F(\bar {L}_{\Phi _{i}}))$. (e) follows from the fact that the l₂ norm of a diagonal matrix is equal to its entry-wise infinity norm. Note that (1) is lower bounded by β, and the probability density function of the normal distribution and its derivative are upper bounded by $\frac {1}{\sqrt {2\pi } \sigma }$ and $\frac {e^{-1/2}}{\sqrt {2\pi } \sigma ^{2}}$, respectively. Then, one can easily check that $\|\nabla ^{2} F(\bar {L}_{\Phi _{i}})\|_{\infty }$ is bounded by $\frac {1}{\sigma ^{2} \beta ^{2}}$. (f) is thus obtained by upper bounding $\|\nabla ^{2} F(\bar {L}_{\Phi _{i}})\|_{\infty }$. (g) follows from (32). Thus, $\tau _{L}(E_{\Phi _{i}}^{t}) \le \frac {1}{L^{t+1}_{p3}}$.

$$ \begin{aligned} &\| \nabla_{E_{\Phi_{i}}} H\left(E_{\Phi_{i}}\right) - \nabla_{E_{\Phi_{i}}} H(E_{\Phi_{i}}') \|_{F} \\ & = \| \nabla F(L_{\Phi_{i}}^{t+1},E_{\Phi_{i}}) - \nabla F(L_{\Phi_{i}}^{t+1},E_{\Phi_{i}}')\|_{F} \\& \stackrel{(\mathrm{h})}= \|\text{diag}(\nabla^{2} F(\bar{E}_{\Phi_{i}})) \text{vec}(E_{\Phi_{i}}-E_{\Phi_{i}}')\|_{F}\\&\le \|\nabla^{2} F(\bar{E}_{\Phi_{i}})\|_{\infty} \|E_{\Phi_{i}}-E_{\Phi_{i}}'\|_{F} \\& \stackrel{(\mathrm{i})} \le \frac{1}{\sigma^{2} \beta^{2}} \|E_{\Phi_{i}}-E_{\Phi_{i}}'\|_{F} \\& \stackrel{(\mathrm{j})} = \frac{1}{\tau_{E}(L_{\Phi_{i}}^{t+1})} \| E_{\Phi_{i}}-E_{\Phi_{i}}' \|_{F}, \end{aligned} $$

(74)

where (h) follows from the differential mean value theorem. (i) is obtained by upper bounding $\|\nabla ^{2} F(\bar {E}_{\Phi _{i}})\|_{\infty }$ by $\frac {1}{\sigma ^{2} \beta ^{2}}$. (j) follows from (33). (74) implies that $\tau _{E}(L_{\Phi _{i}}^{t+1})=\sigma ^{2} \beta ^{2} \le \frac {1}{L^{t+1}_{p4}}$.

$$ \begin{aligned} &\| \nabla_{U} H(U) - \nabla_{U} H(U') \|_{F} \\ & = \| \lambda_{2} (U-U')(V^{t})^{T}V^{t+1}\|_{F} \\& \le \|\lambda_{2} (V^{t+1})^{T}V^{t+1}\|_{2}\|U-U'\|_{F} \\& \stackrel{(\mathrm{k})} \le \|\lambda_{2} (V^{t+1})^{T}V^{t+1}\|_{F}\|U-U'\|_{F} \\& \stackrel{(\mathrm{l})} = \|\lambda_{2} \sum_{i=1}^{W} \iota_{i}^{t+1}\|_{F}\|U-U'\|_{F} \\& \stackrel{(\mathrm{m})} = \frac{1}{\tau_{U}(V^{t+1})} \| U-U' \|_{F}, \end{aligned} $$

(75)

where (k) follows from the inequality ∥·∥₂≤∥·∥_F. (l) follows from $(V^{t+1})^{T}V^{t+1}=\sum _{i=1}^{W} \iota _{\Phi _{i}}^{t+1}$. Since $\|\lambda _{2} \sum _{i=1}^{W} \iota _{\Phi _{i}}^{t+1}\|_{F} \geq L^{t+1}_{p5}$, (m) follows from (34). (75) implies that $L^{t+1}_{p5} \leq \|\lambda _{2} \sum _{i=1}^{W} \iota _{i}^{t+1}\|_{F}, \textrm { and} \tau _{U}(V^{t+1}) \le 1/L^{t+1}_{p5}$.

Based on Definition 3, (69)–(75) guarantee the Lipschitz differentiability of H and provide the Lipschitz constants and the step sizes of the DSAPA.

13 Appendix 7

13.1 Proof of Theorem 3

Proof

The constraints in (22) can be transferred to the following indicator functions.

$$ K_{1}(C_{\Phi_{i}})= \left \{ \begin{array}{rcl} \infty & \text{if there exists a} \\ & (C_{\Phi_{i}})_{iq -q+j,j}\neq 0, \forall j \in [q]\ \\ 0 & \text{otherwise} \end{array} \right. $$

(76)

$$ K_{2}(C_{\Phi_{i}})= \left \{ \begin{array}{rcl} \infty & \text{if there exists a} \\ & (C_{\Phi_{i}})_{\star j} ~\text{s.t.}~ \|(C_{\Phi_{i}})_{\star j}\|_{0} > d,\\& j \in [q] \\ 0 & \text{otherwise} \end{array} \right. $$

(77)

$$ B(L_{\Phi_{i}})= \left \{ \begin{array}{rcl} \infty & \text{if}~ \|L_{\Phi_{i}}\|_{\infty} > \alpha_{1} \\ 0 & \text{otherwise} \end{array} \right. $$

(78)

$$ J_{1}\left(E_{\Phi_{i}}\right)= \left \{ \begin{array}{rcl} \infty & \text{if}~ \|E_{\Phi_{i}}\|_{\infty} > \alpha_{2} \\ 0 & \text{otherwise} \end{array} \right. $$

(79)

$$ J_{2}\left(E_{\Phi_{i}}\right)= \left \{ \begin{array}{rcl} \infty & \text{if}~ \|E_{\Phi_{i}}\|_{0} > s/W \\ 0 & \text{otherwise} \end{array} \right. $$

(80)

(76)–(80) correspond to the operations of projection in DSAPA.

Similar to the proof of Theorem 3 in [27], DSAPA globally converges to a critical point of (16) from any initial point, provided that H is Lipschitz differentiable, and

$$ \begin{aligned} & H + \sum_{i=1}^{W} (K_{1}(C_{\Phi_{i}})+K_{2}(C_{\Phi_{i}})+\\&~~~~~~~~~~~~~B(L_{\Phi_{i}})+J_{1}\left(E_{\Phi_{i}}\right)+J_{2}\left(E_{\Phi_{i}}\right)) \end{aligned} $$

(81)

satisfies the Kurdyka-Lojasiewicz (KL) property.

The proof of the Lipschitz differentiable property of H is shown in Appendix 6. $B(L_{\Phi _{i}})$, $\phantom {\dot {i}\!}J_{1}\left (E_{\Phi _{i}}\right)$, $\phantom {\dot {i}\!}J_{2}\left (E_{\Phi _{i}}\right)$, $\phantom {\dot {i}\!}K_{1}(C_{\Phi _{i}})$, and $\phantom {\dot {i}\!}K_{2}(C_{\Phi _{i}})$ are indicator functions of semi-algebraic sets. Therefore, they are KL functions according to [60]. Since H is differentiable everywhere, or equivalently, real analytic, H also has the KL property according to the examples in session 2.2 of [61]. Thus, (81) satisfies the KL property. □

Availability of data and materials

The Irish smart meter datasets that support the findings of this study are available from the Irish Social Science Data Archive (ISSDA) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the Irish Social Science Data Archive (ISSDA). The UMass smart ^∗ microgrid dataset analyzed during the current study is available in http://traces.cs.umass.edu/index.php/Smart/Smart.

Notes

Throughout this paper, we refer to each household as one user.
S_i’s (i∈[p]) are distinct provided for any i, j, there always exists some β that belongs to S_i but not S_j.
We use the notations $u(n)\in \mathcal {O}(v(n))$, u(n)∈o(v(n)), or u(n)=Θ(v(n)) if as n goes to infinity, u(n)≤c·v(n), u(n)≥c·v(n) or c₁·v(n)≤u(n)≤c₂·v(n) eventually holds for some positive constants c, c₁ and c₂, respectively.
We assume for simplicity that the corruptions are distributed evenly such that the number of nonzero entries in $E_{\Phi _{i}}$ is at most $\frac {s}{W}$. The algorithm can be easily extended to cases that the numbers of corruptions are different as long as a reasonable accurate upper bound of the number of corruptions is available.

Abbreviations

UoS:: Union of Subspaces
DSAPA:: Distributed Sparse Alternative Proximal Algorithm
c.d.f:: Cumulative distribution function
SSC:: Sparse Subspace Clustering
PMU:: Phasor measurement unit
KL:: Kullback-Leibler
NI:: Normalized mutual information
APGM:: Approximate projected gradient method
QRPCA:: Quantized Robust Principal Component Analysis
NILM:: Non-intrusive load monitoring

References

G. W. Hart, Nonintrusive appliance load monitoring. Proc. IEEE. 80(12), 1870–1891 (1992).
Article Google Scholar
E. J. Aladesanmi, K. A. Folly, Overview of non-intrusive load monitoring and identification techniques. IFAC-PapersOnLine. 48(30), 415–420 (2015).
Article Google Scholar
Z. Erkin, G. Tsudik, in International Conference on Applied Cryptography and Network Security. Private computation of spatial and temporal power consumption with smart meters (SpringerSingapore, 2012), pp. 561–577.
Chapter Google Scholar
P. Barbosa, A. Brito, H. Almeida, S. Clauß, in Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC ’14. Lightweight privacy for smart metering data by adding noise (ACMGyeongju, 2014), pp. 531–538.
Google Scholar
J. M. Bohli, C. Sorge, O. Ugus, in 2010 IEEE International Conference on Communications Workshops. A privacy model for smart metering (IEEECape Town, 2010), pp. 1–5.
Google Scholar
M. Backes, S. Meiser, in Data Privacy Management and Autonomous Spontaneous Security. Differentially private smart metering with battery recharging (SpringerBerlin, 2014), pp. 194–212.
Chapter Google Scholar
D. Varodayan, A. Khisti, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Smart meter privacy using a rechargeable battery: minimizing the rate of information leakage (IEEEPrague, 2011), pp. 1932–1935.
Chapter Google Scholar
D. Egarter, C. Prokop, W. Elmenreich, in 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm). Load hiding of household’s power demand (IEEEVenice, 2014), pp. 854–859.
Chapter Google Scholar
S. McLaughlin, P. McDaniel, W. Aiello, in Proceedings of the 18th ACM Conference on Computer and Communications Security. Protecting consumer privacy from electric load monitoring (ACMChicago, 2011), pp. 87–98.
Google Scholar
X. He, X. Zhang, C. C. J. Kuo, A distortion-based approach to privacy-preserving metering in smart grids. IEEE Access. 1:, 67–78 (2013).
Article Google Scholar
M. Savi, C. Rottondi, G. Verticale, Evaluation of the precision-privacy tradeoff of data perturbation for smart metering. IEEE Trans. Smart Grid. 6(5), 2409–2416 (2015).
Article Google Scholar
O. Tan, D. Gunduz, H. V. Poor, Increasing smart meter privacy through energy harvesting and storage devices. IEEE J. Sel. Areas Commun.31(7), 1331–1341 (2013).
Article Google Scholar
F. L. Quilumba, W. -J. Lee, H. Huang, D. Y. Wang, R. L. Szabados, Using smart meter data to improve the accuracy of intraday load forecasting considering customer behavior similarities. IEEE Trans. Smart Grid. 6(2), 911–918 (2015).
Article Google Scholar
A. Albert, R. Ram, Smart meter driven segmentation: what your consumption says about you. IEEE Trans Power Syst.28(4), 4019–4030 (2013).
Article Google Scholar
N. Mahmoudi-Kohan, M. P. Moghaddam, M. K. Sheikh-El-Eslami, E. Shayesteh, A three-stage strategy for optimal price offering by a retailer based on clustering techniques. Int. J. Electr. Power Energy Syst.32(10), 1135–1142 (2010).
Article Google Scholar
C. Dwork, in International Conference on Theory and Applications of Models of Computation, Differential privacy: A survey of results. Differential privacy (SpringerXi’an, 2008), pp. 1–19.
MATH Google Scholar
L. Sankar, S. Kar, R. Tandon, H. V. Poor, in Proc. IEEE International Conference on Smart Grid Communications (SmartGridComm). Competitive privacy in the smart grid: an information-theoretic approach (IEEEBrussels, 2011), pp. 220–225.
Google Scholar
C. Y. Ma, D. K. Yau, in Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security. On information-theoretic measures for quantifying privacy protection of time-series data (ACMSingapore, 2015), pp. 427–438.
Google Scholar
S. Li, A. Khisti, A. Mahajan, Information-theoretic privacy for smart metering systems with a rechargeable battery. IEEE Trans. Inf. Theory. 64(5), 3679–3695 (2018).
Article MathSciNet MATH Google Scholar
A. Reinhardt, F. Englert, D. Christin, Averting the privacy risks of smart metering by local data preprocessing. Pervasive Mob. Comput.16:, 171–183 (2015).
Article Google Scholar
E. Elhamifar, R. Vidal, Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell.35(11), 2765–2781 (2013).
Article Google Scholar
B. Eriksson, L. Balzano, R. Nowak, in Proc. Int. Conf. Artif. Intell. Stat. High-rank matrix completion (JMLRLa Palma, 2012), pp. 373–381.
Google Scholar
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell.35(1), 171–184 (2013).
Article Google Scholar
V. M. Patel, H. Van Nguyen, R. Vidal, Latent space sparse and low-rank subspace clustering. IEEE J. Sel. Topics Signal Process.9(4), 691–701 (2015).
Article Google Scholar
M. Soltanolkotabi, E. J. Candès, A geometric analysis of subspace clustering with outliers. Ann. Stat.40(4), 2195–2238 (2012).
Article MathSciNet MATH Google Scholar
M. Soltanolkotabi, E. Elhamifar, E. J. Candés, Robust subspace clustering. Ann. Stat.42(2), 669–699 (2014).
Article MathSciNet MATH Google Scholar
R. Wang, M. Wang, J. Xiong, Data recovery and subspace clustering from quantized and corrupted measurements. IEEE J. Sel. Topics Signal Process., Spec Issue Robust Subspace Learn. Tracking Theory Algoritm Appl.12(6), 1547–1560 (2018).
Google Scholar
S. A. Bhaskar, Probabilistic low-rank matrix completion from quantized measurements. J. Mach. Learn. Res.17(60), 1–34 (2016).
MathSciNet MATH Google Scholar
Y. Cao, Y. Xie, in Proc. IEEE Int. Workshop Comput. Adv. Multi-Sensor Adapt. Process. Categorical matrix completion (IEEECancun, 2015).
Google Scholar
T. Cai, W. -X. Zhou, A max-norm constrained minimization approach to 1-bit matrix completion. J. Mach. Learn. Res.14(1), 3619–3647 (2013).
MathSciNet MATH Google Scholar
M. A. Davenport, Y. Plan, E. van den Berg, M. Wootters, 1-bit matrix completion. Inf. Infer.3(3), 189–223 (2014).
MathSciNet MATH Google Scholar
P. Gao, M. Wang, J. H. Chow, M. Berger, L. M. Seversky, Missing data recovery for high-dimensional signals with nonlinear low-dimensional structures. IEEE Trans. Signal Process.65(20), 5421–5436 (2017).
Article MathSciNet MATH Google Scholar
O. Klopp, J. Lafond, É Moulines, J. Salmon, Adaptive multinomial matrix completion. Electron. J. Stat.9(2), 2950–2975 (2015).
Article MathSciNet MATH Google Scholar
J. Lafond, O. Klopp, E. Moulines, J. Salmon, in Adv. Neural Inf. Process. Syst. Probabilistic low-rank matrix completion on finite alphabets (Curran AssociatesMontreal, 2014), pp. 1727–1735.
Google Scholar
A. S. Lan, C. Studer, R. G. Baraniuk, in Proc. IEEE Int. Conf. Acoust Speech Signal Process. Matrix recovery from quantized and corrupted measurements (IEEEFlorence, 2014), pp. 4973–4977.
Google Scholar
A. S. Lan, A. E. Waters, C. Studer, R. G. Baraniuk, Sparse factor analysis for learning and content analytics. J. Mach. Learn. Res.15(1), 1959–2008 (2014).
MathSciNet MATH Google Scholar
P. Gao, R. Wang, M. Wang, J. H. Chow, Low-rank matrix recovery from noisy, quantized and erroneous measurements. IEEE Trans. Signal Process.66(11), 2918–2932 (2018).
MathSciNet MATH Google Scholar
S. A. Bhaskar, in Proc. Asilomar Conf. Signals Syst. Comput. Probabilistic low-rank matrix recovery from quantized measurements: application to image denoising, (2015), pp. 541–545.
S. A. Bhaskar, Localization from connectivity: a 1-bit maximum likelihood approach. IEEE/ACM Trans. Netw.24(5), 2939–2953 (2016).
Article Google Scholar
Y. Yang, J. Feng, N. Jojic, J. Yang, T. S. Huang, in European Conference on Computer Vision. l0-sparse subspace clustering (SpringerAmsterdam, 2016), pp. 731–747.
Google Scholar
A. Y. Ng, M. I. Jordan, Y. Weiss, in Adv. Neural Inf. Process. Syst. On spectral clustering: analysis and an algorithm (Morgan Kaufmann PublishersVancouver, 2002), pp. 849–856.
Google Scholar
J. Lin, E. Keogh, L. Wei, S. Lonardi, Experiencing sax: a novel symbolic representation of time series. Data Min. Knowl. Discov.15(2), 107–144 (2007).
Article MathSciNet Google Scholar
E. Keogh, K. Chakrabarti, M. Pazzani, S. Mehrotra, in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. Locally adaptive dimensionality reduction for indexing large time series databases (ACMSanta Barbara, 2001), pp. 151–162.
Chapter Google Scholar
R. Basri, D. W. Jacobs, Lambertian reflectance and linear subspaces. IEEE Trans. Pattern Anal. Mach. Intell.25(2), 218–233 (2003).
Article Google Scholar
P. Gao, M. Wang, S. G. Ghiocel, J. H. Chow, B. Fardanesh, G. Stefopoulos, Missing data recovery by exploiting low-dimensionality in power system synchrophasor measurements. IEEE Trans. Power Syst.31(2), 1006–1013 (2016).
Article Google Scholar
M. B. Hossain, I. Natgunanathan, Y. Xiang, L. -X. Yang, G. Huang, Enhanced smart meter privacy protection using rechargeable batteries. IEEE Internet Things J.6(4), 7079–7092 (2019).
Article Google Scholar
A. Reinhardt, D. Egarter, G. Konstantinou, D. Christin, in 2015 IEEE International Conference on Smart Grid Communications (SmartGridComm). Worried about privacy? Let your PV converter cover your electricity consumption fingerprints (IEEEMiami, 2015), pp. 25–30.
Chapter Google Scholar
L. Sweeney, k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl-Based Syst.10(05), 557–570 (2002).
Article MathSciNet MATH Google Scholar
R. L. Lagendijk, Z. Erkin, M. Barni, Encrypted signal processing for privacy protection: conveying the utility of homomorphic encryption and multiparty computation. IEEE Signal Proc. Mag.30(1), 82–105 (2012).
Article Google Scholar
T. Baumeister, in 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm). Adapting PKI for the smart grid (IEEEBrussels, 2011), pp. 249–254.
Chapter Google Scholar
G. Giaconi, D. Gündüz, H. V. Poor, in 2015 IEEE International Conference on Communications (ICC). Smart meter privacy with an energy harvesting device and instantaneous power constraints (IEEEMiami, 2015), pp. 7216–7221.
Chapter Google Scholar
G. Kalogridis, C. Efthymiou, S. Z. Denic, T. A. Lewis, R. Cepeda, in 2010 First IEEE International Conference on Smart Grid Communications. Privacy for smart meters: towards undetectable appliance load signatures (IEEEGaithersburg, 2010), pp. 232–237.
Chapter Google Scholar
J. Gomez-Vilardebo, D. Gündüz, Smart meter privacy for multiple users in the presence of an alternative energy source. IEEE Trans. Inf. Forensic Secur.10(1), 132–141 (2014).
Article Google Scholar
Y. Hong, W. M. Liu, L. Wang, Privacy preserving smart meter streaming against information leakage of appliance status. IEEE Trans. Inf. Forensic Secur.12(9), 2227–2241 (2017).
Article Google Scholar
J. A. Snyman, N. Stander, W. J. Roux, A dynamic penalty function method for the solution of structural optimization problems. Appl. Math. Model.18(8), 453–460 (1994).
Article MATH Google Scholar
Commission for Energy Regulation Smart Metering Project. http://www.ucd.ie/issda/data/commissionforenergyregulationcer. Accessed 5 July 2018.
S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, J. Albrecht, et al, Smart*: an open data set and tools for enabling research in sustainable homes. SustKDD, August. 111(112), 108 (2012).
Google Scholar
S. P. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn.3(1), 1–122 (2011).
MATH Google Scholar
T. M. Cover, J. A. Thomas, Elements of Information Theory (Wiley, Hoboken, 2012).
MATH Google Scholar
J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program.146(1-2), 459–494 (2014).
Article MathSciNet MATH Google Scholar
Y. Xu, W. Yin, A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imag. Sci.6(3), 1758–1789 (2013).
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research is supported in part by ARO W911NF-17-1-0407 and the Rensselaer-IBM AI Research Collaboration (http://airc.rpi.edu), part of the IBM AI Horizons Network (http://ibm.biz/AIHorizons).

Author information

Authors and Affiliations

Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
Ren Wang & Meng Wang
IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Jinjun Xiong

Authors

Ren Wang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinjun Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Ren and Meng conceived and designed the method and the experiments. Ren performed the experiments and drafted the manuscript. Meng revised the manuscript. Jinjun provided many helpful suggestions. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Meng Wang.

Ethics declarations

Consent for publication

Informed consent was obtained from all authors included in the study.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, R., Wang, M. & Xiong, J. Achieve data privacy and clustering accuracy simultaneously through quantized data recovery. EURASIP J. Adv. Signal Process. 2020, 22 (2020). https://doi.org/10.1186/s13634-020-00682-7

Download citation

Received: 15 October 2019
Accepted: 17 April 2020
Published: 07 May 2020
DOI: https://doi.org/10.1186/s13634-020-00682-7

Achieve data privacy and clustering accuracy simultaneously through quantized data recovery

Abstract

1 Introduction

2 Our proposed framework of privacy-preserving data collection and information recovery

2.1 Our framework and problem formulation

Definition 1

2.2 Related work

2.3 Data privacy enhancement in the proposed framework

Definition 2

3 Results: theoretical

3.1 Data recovery guarantee

Assumption 1

Assumption 2

Theorem 1

3.2 Fundamental limit of any recovery method

Theorem 2

3.3 Privacy from the recovery perspective

3.3.1 Recovery of a single user from its own data only

3.3.2 Recovery of a single user by leveraging other users in the same group

3.4 Clustering guarantee

Proposition 1

4 Distributed sparse alternative proximal algorithm for data recovery and clustering

Theorem 3

5 Results: numerical experiments

6 Conclusion and discussions

7 Appendix 1

7.1 Supporting lemmas used in the Proof of Theorem 1

Lemma 1

Proof

Lemma 2

Proof

8 Appendix 2

8.1 Proof of Theorem 1

Proof

9 Appendix 3

9.1 Supporting lemmas for Theorem 2

Lemma 3

Proof

Lemma 4

Proof

10 Appendix 4

10.1 Proof of Theorem 2

Proof

11 Appendix 5

11.1 Proof of Proposition 1

Proof

12 Appendix 6

12.1 DSAPA: proof of the Lipschitz differential property and calculation of Lipschitz constants

Definition 3

13 Appendix 7

13.1 Proof of Theorem 3

Proof

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords