We consider the truncated trajectory matrices M formed by concatenating the individual trajectory matrices according to the MSSA approach. The objective of this work is to consider a generative model that produces the time series Hankel matrices M according to the factorization M
=
D
L where M may correspond to a single or multiple sources. In both cases, our key assumption is that given a full rank dictionary matrix D obtained through training data, the coefficient matrix L is approximately low rank, i.e., the number of significant singular vectors is much smaller than the ambient dimensions of the matrix.
To apply the low-rank representation scheme on matrices with missing data, the introduction of the random sub-sampling operator is necessary. Our proposed sampling scheme is a combination of MC and reduced rank multivariate linear regression and it seeks a low-rank presentation coefficient matrix L from a small number of measurements \(\mathcal {P}_{\Omega }(\mathbf {M})\). Based on this generative model, our proposed Singular Spectrum Matrix Completion (SS-MC) formulation is given by:
$$\begin{array}{*{20}l} & \underset{\mathbf{L}}{\text{minimize}} ~~ rank(\mathbf{L}) \notag \\ & \text{subject to} ~~ \mathcal{P}_{\Omega}(\mathbf{M}) = \mathcal{P}_{\Omega}({\mathbf{DL}}) \end{array} $$
(8)
where D is a dictionary of elementary atoms that span a low-rank data-induced subspace. Figure 2 presents an example of a real trajectory matrix (left), the representations coefficients L (center), and the singular value distribution of the coefficients (right).
5.1 Efficient optimization
Similarly to MC optimization, the problem in Eq. (8) is NP-hard due to the rank in the objective function and thus it cannot be solved efficiently for reasonably sized data. A remedy to this problem is to replace the rank constraint with the nuclear norm constraint, thus solving:
$$\begin{array}{*{20}l} & \underset{\mathbf{L}}{\text{minimize}} ~~\|\mathbf{L}\|_{*} \notag \\ & \text{subject to} ~~ \mathcal{P}_{\Omega}(\mathbf{M}) = \mathcal{P}_{\Omega}(\mathbf{DL}) \end{array} $$
(9)
A key novelty of our work is that in addition to the low rank of the matrix, during the recovery, we employ a dictionary for modeling the generative process that produces the sensed data, as it can be seen in Eq. (9).
The problem in Eq. (9) can be transformed to a semidefinite programming problem and solved using interior point methods [47, 48]. However, utilizing such off-the-shelf solvers introduces a very high algorithmic complexity which renders them impractical, even for moderately sized scenarios. Motivated by the requirements for a data collection mechanism that is both accurate and efficient, we reformulate the SS-MC problem in an Augmented Lagrangian form. By utilizing the ALM formulation for SS-MC, we can achieve efficient recovery, tailored to the specific properties of the problem. Introducing the intermediate dummy variables Z and E, Eq. (9) can be written as:
$$\begin{array}{*{20}l} & \underset{\mathbf{L,Z,E}}{\text{minimize}} ~~\|\mathbf{L}\|_{*} \notag \\ & \text{subject to}~~ \mathbf{M}=\mathbf{DZ+E} \notag \\ & ~~~~~~~~~~~~~~~ \mathbf{Z}=\mathbf{L} \notag \\ & ~~~~~~~~~~~~~~~ \mathcal{P}_{\Omega}(\mathbf{E})=0 \end{array} $$
(10)
where L
,
Z, and E are the minimization variables. The extra variable Z is introduced in order to decouple the minimization variables by separating the L variable in the objective function with the Z variable in the first constraint. Similar to the ALM formulation for MC in Eq. (7), E is introduced in order to account for the missing entries in M. More specifically, the constraint on the error matrix E is applied only on the available data via the sampling operator \(\mathcal {P}\). The ALM form of Eq. (10) is an unconstrained minimization given by:
$$ {\begin{aligned}{} \mathcal{L} (\mathbf{L,Z,E,Y_{1},Y_{2}},\mu)&= {\|\mathbf{L}\|}_{*} + tr\left(\mathbf{Y}_{1}^{T} (\mathcal{P}_{\Omega}({\mathbf M} - \mathbf{DZ}))\right) \\ &\quad+ tr \left(\mathbf{Y}_{2}^{T} (\mathbf{Z-L})\right) \\&\quad+ \frac{\mu}{2} (\|\mathcal{P}_{\Omega}(\mathbf{M-DZ})\|_{F}^{2} + \|\mathbf{Z-L}\|_{F^{2}}) \end{aligned}} $$
(11)
where Y
1 and Y
2 are Lagrange multiplier matrices. The solution can be found by iteratively minimizing Eq. (11) with respect to each of the variables via an ADMM approach. Formally, the minimization problem with respect to L is given by:
$$\begin{array}{*{20}l} \mathbf{L}^{(k+1)}&= \min_{\mathbf{L}} \mathcal L\left(\mathbf{L}^{(k)},\mathbf{Z}^{(k)},\mathbf{E}^{(k)},\mathbf{Y}^{(k)}_{1},\mathbf{Y}^{(k)}_{2}, \mu^{(k)}\right) \notag \\ &=\min \|\mathbf{L}\|_{*} + tr \left(\mathbf{Y_{2}^{T}} (\mathbf{Z-L})\right) + \frac{\mu}{2} \left(\|\mathbf{Z-L}\|_{F}^{2}\right) \notag \\ &=\min \frac{1}{\mu}\|\mathbf{L}\|_{*} + \frac{1}{2}\| \mathbf{L-(Z+Y}_{2}/\mu) \|_{F}^{2} ~. \end{array} $$
(12)
The sub-problem in Eq. (12) is a nuclear norm minimization problem and can be solved very efficiently by the Singular Value Thresholding operator [43]. The minimization with respect to Z is given by:
$$\begin{array}{*{20}l}{} \mathbf{Z}^{(k+1)}&= \min_{\mathbf{Z}} \mathcal L\left(\mathbf{L}^{(k+1)},\mathbf{Z}^{(k)},\mathbf{E}^{(k)}, \mathbf{Y}^{(k)}_{1},\mathbf{Y}^{(k)}_{2},\mu^{(k)}\right) \notag \\ & =\min tr \left(\mathbf{Y}_{1}^{T}(\mathcal{P}_{\Omega}(\mathbf{M})- \mathcal{P}_{\Omega}(\mathbf{DZ}))\right)+ tr \left(\mathbf{Y}_{2}^{T} (\mathbf{Z-L})\right) \notag \\ & ~~~+\frac{\mu}{2} \left(\|\mathcal{P}_{\Omega}(\mathbf{M-DZ})\|_{F}^{2} + \|\mathbf{Z-L}\|_{F}^{2}\right) ~. \end{array} $$
(13)
Calculating the gradient of the expression in Eq. (13), we obtain:
$$\begin{array}{*{20}l} \frac{\partial \mathcal L}{\partial \mathbf{Z}} =\mathbf{D}^{T} \mathbf{Y}_{1} - \mathbf{Y}_{2} + {\mu\left(\mathbf{D}^{T}\left(\mathbf{M-E-DZ}\right)-\mathbf{Z+L}\right)} \end{array} $$
(14)
which after setting it equal to zero provides the update equation for Eq. (14) given by:
$$ \begin{aligned} \mathbf{Z}^{(k+1)}&= \left(\mathbf{I}+\mathbf{D}^{T}\mathbf{D}\right)^{-1} \left(\mathbf{D}^{T}\left(\mathbf{M}^{(k)}-\mathbf{E}^{(k)}\right)+\mathbf{L}^{(k)}\right.\\ &\quad+ \left.\left(\mathbf{D}^{T}\mathbf{Y}_{1}^{(k)}-\mathbf{Y}_{2}^{(k)}\right) / \mu^{(k)}\right) ~. \end{aligned} $$
(15)
Furthermore, the augmented Lagrangian in Eq. (11) has to be minimized with respect to E, i.e.,
$$\begin{array}{*{20}l}{} \mathbf{E}^{(k+1)}&= \min_{\mathcal{P}_{\Omega}(\mathbf{M})=0} \mathcal L\left(\mathbf{L}^{(k+1)},\mathbf{Z}^{(k+1)},\mathbf{E}^{(k)},Y^{(k)}_{1},Y^{(k)}_{2},\mu^{(k)}\right) \notag \\ & = \min_{\mathcal{P}_{\Omega}(M)=0} \mathbf{Y_{1}+\mu (E-M+DZ) } \end{array} $$
(16)
which provides the update equation for Eq. (16) that is given by:
$$\begin{array}{*{20}l} \mathbf{E}^{(k+1)}= \mathcal{P}_{\not{\Omega}}\left(\mathbf{M}-\mathbf{D} \mathbf{Z}^{(k+1)}+\frac{1}{\mu^{(k)}} \mathbf{Y}_{1}^{k}\right) \end{array} $$
(17)
where the notation \(\mathcal {P}_{\not {\Omega }}\) is used to restrict the error estimation only on the measurements that do not belong to the sampling set. Last, we perform updates on the two Lagrange multipliers Y
1 and Y
2. The steps at each iteration of the optimization are shown in Algorithm 1.
Due to its numerous applications, the ADMM method has been extensively studied in the literature for the case of two variables [45, 46] where it has been shown that under mild conditions regarding the convexity of the cost functions, the two-variables ADMM converges at a rate \(\mathcal {O}(1/r)\) [49]. Although extending the convergence properties to a larger number of variables has not been shown in general, recently the convergence properties of ADMM for a sum of two or more non-smooth convex separable functions subject to linear constraints were examined [50].
The proposed minimization scheme in Eq. (11) satisfies a large number of the constraints suggested in [50] such as the convexity of each sub-problem, the strict convexity and continuous differentiability of the nuclear norm, the full rank of the dictionary, and the size of the step for the dual update α, while empirical evidence suggests that the closed form solution of each sub-problem allows the SS-MC algorithm to converge to an accurate solution in a small number of iterations.
5.2 Singular spectrum dictionary
In this work, we investigate the utilization of prior knowledge for the efficient reconstruction of severely under-sampled time series data. To model the data, we follow a generative scheme where the full collection of acquired measurements is encoded in the trajectory matrix \(\mathbf {M} \in \mathbb {R}^{K \times L}\). M is assumed to be generated from a combination of a dictionary \(\mathbf {D} \in \mathbb {R}^{K \times K}\) and a coefficient matrix \(\mathbf {L} \in \mathbb {R}^{K \times L}\) according to M
=
D
L, where we assume that K≤L. This particular factorization is related to SVD by M
=
D
L
=
U
(
S
V
T) where the orthonormal matrix D
=
U is a basis for the subspace associated with the column space of M, while L
=
S
V
T is a low-rank representation matrix encoding the projection of the trajectory matrix onto this subspace.
This particular choice of dictionary D implies a specific relationship between the spectral characteristics of the trajectory matrix M and the low-rank representation matrix L. To understand this relationship, we consider the spectral decomposition of each individual matrix in the form D=U
G
1
R
−1 and L=R
G
2
V
∗ The matrices U
,
R and V are unitary while G
1 and G
2 are diagonal matrices containing the singular values of the D and L, respectively. The particular factorization permits us to utilize the product SVD [51, 52] and expresses the singular value decomposition of the product according to the expression D
L=U(G
1
G
2)V
∗, where the singular values of the matrix product are given by the product of the singular values of the corresponding matrices.
In this work, we consider orthogonal dictionaries, as opposed to overcomplete ones. Orthogonality of the dictionary guarantees that the vectors encoded in the dictionary span the low-dimensional subspace and therefore the representation of the measurements is possible. Furthermore, an orthonormal dictionary, such as the one considered in this work, is characterized by G
1=I, leaving G
2 responsible for the representation. We target exactly G
2 in our problem formulation by seeking a low-rank representation matrix L.
In our experimental results, we consider sets of training data associated with fully sampled time series from the first days of each experiment for generating the dictionaries. The subspace identified by the fully sampled data is used for the subsequent recovery of past measurements and prediction of future ones. Alternatively, the dictionary could be updated during the course of the SS-MC application via an incremental subspace learning method [53, 54]. We opted out from an incremental subspace learning since although it can potentially lead to better estimation, it is also associated with increased computational load and the higher probability of estimation drift and lower performance.
5.3 Networking aspects of SS-MC
In the context of IoT applications utilizing WSN infrastructures, communication can take place among nodes, but most typically between the nodes and the base station where data analytics are extracted. This communication can be supported (a) by a direct wireless link between the nodes and the sink/base station; (b) via appropriate paths that allow multi-hop communications; or (c) via more powerful cluster heads what forward the measurements to the base station.
For the multi-hop scheme, equal weight of each sample (democratic sampling) implies that no complicated processing needs to take place by the resource limited forwarding nodes. Furthermore, for high-performance WSNs, where point-to-point communication between nodes is available and processing capabilities are sufficient, nodes could perform reconstruction of a local neighborhood thus offering advantages similar to other distributed estimation schemes [55].
From a practical point-of-view, we argue that recovery and prediction of measurements from low sampling rates offer numerous advantages. First, it saves energy by reducing the number of samples that have to be acquired, processed, and communicated thus increasing the lifetime of the network. The proposed sampling scheme also reduces the frequency of sensor re-calibrations for sensors that perform complex signal acquisition, including chemical and biological sampling. As a result, higher quality measurements and therefore more reliable estimation of the field samples can be achieved. Furthermore, the method increases robustness to communication errors by estimating measurements included in lost or dropped packets, without the need for retransmission. Last, our scheme does not require explicit knowledge of node locations for the estimation of the missing measurements, since the incomplete measurement matrices and the corresponding trajectory matrices are indexed by the sensor id, thus allowing greater flexibility during deployment.