- Research
- Open access
- Published:
Local low-rank approach to nonlinear matrix completion
EURASIP Journal on Advances in Signal Processing volume 2021, Article number: 11 (2021)
Abstract
This paper deals with a problem of matrix completion in which each column vector of the matrix belongs to a low-dimensional differentiable manifold (LDDM), with the target matrix being high or full rank. To solve this problem, algorithms based on polynomial mapping and matrix-rank minimization (MRM) have been proposed; such methods assume that each column vector of the target matrix is generated as a vector in a low-dimensional linear subspace (LDLS) and mapped to a pth order polynomial and that the rank of a matrix whose column vectors are dth monomial features of target column vectors is deficient. However, a large number of columns and observed values are needed to strictly solve the MRM problem using this method when p is large; therefore, this paper proposes a new method for obtaining the solution by minimizing the rank of the submatrix without transforming the target matrix, so as to obtain high estimation accuracy even when the number of columns is small. This method is based on the assumption that an LDDM can be approximated locally as an LDLS to achieve high completion accuracy without transforming the target matrix. Numerical examples show that the proposed method has a higher accuracy than other low-rank approaches.
1 Introduction
This paper deals with the following completion problem for a matrix \( \boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}} \) on a low-dimensional differentiable manifold (LDDM) \(\mathcal {M}_{r}\):
where the (m,n)th element of a matrix is denoted by \((\cdot)_{m,n}, \mathcal {I} \) is an index set defined as \( \mathcal {I}=\{1,2,\cdots,N\} \), and \( \mathcal {M}_{r} \subset {\mathbb {R}^{M}}, \Omega \), and X(0) denote an unknown r-dimensional differential manifold, a given index set, and a given observed matrix, respectively. In this paper, the LDDM \(\mathcal {M}_{r}\) satisfies the following condition: on an open set \(\mathcal {U}_{\lambda }\) satisfying \(\bigcup _{\lambda } \mathcal {U}_{\lambda } = \mathcal {M}_{r}\), there exists a differentiable homeomorphism \( \boldsymbol {\phi }_{\lambda } : {\mathcal {U}}_{\lambda } \mapsto \mathcal {U}^{\prime }_{\lambda } \), where \(\mathcal {U}^{\prime }_{\lambda }\) denotes an open set of \({\mathbb {R}^{r}}\). If \(\mathcal {M}_{r}\) is an unknown low-dimensional linear subspace (LDLS), then this is a low-rank matrix completion problem. Many algorithms have been proposed [1–6] to obtain solutions to this problem with high estimation accuracy. The low-rank matrix completion problem has various applications in the field of signal processing, including collaborative filtering [7], low-order model fitting and system identification [8], image inpainting [9], and human-motion recovery [10], all of which are formulated as signal recovery or estimation problems. However, in most practical applications, the column vectors of a matrix do not belong to an LDLS, i.e., \(\mathcal {M}_{r}\) is not an LDLS. Therefore, these algorithms do not achieve high performance. As an example, a matrix is of high rank when its column vectors lie on a union of linear subspaces (UoLS), which the column space of the matrix is high dimension even when the dimension of the linear subspace is low. In this case, several methods have been proposed to solve this high-rank matrix completion problem [11–16], all of which are based on subspace clustering [17]. In particular, [15] proposed an algebraic variety approach known as variety-based matrix completion (VMC), which is based on the fact that the monomial features of each column vector belong to an LDLS when the column vectors belong to a UoLS. This approach solves the rank minimization problem about the Gram matrix of the monomial features by relaxing the problem into one of rank minimization of a polynomial kernel matrix. Unfortunately, these algorithms recover a matrix only when \({\mathcal {M}}_{r}\) can be approximately divided into some LDLSs and do not work well otherwise.
To solve the matrix completion problem on a general LDDM, some nonlinear methods have been proposed [18–22]. In particular, Fan et al. [19–21] have proposed a method based on a kind of kernel principal component analysis [23] that assumes that the dimension of the subspace spanned by the column vectors mapped nonlinearly is low. They formulate the matrix completion problem as a low-rank approximation problem of the kernel matrix, in common with [15]; however, they require a large number of observed entries in the matrix to solve the problem, and the matrix completion accuracy declines when the number of observed entries is small.
In the present paper, a new method is proposed that uses neither the monomial features nor the kernel method to achieve high completion accuracy. Based on an idea similar to that of locally linear embedding [24, 25], this paper assumes that an LDDM can be approximated locally as a LDLS, because there are tangent hyperplanes whose dimension is equal to that of the manifold. The matrix completion problem is then formulated as one of minimizing the rank of the local submatrix of X whose columns are local nearest neighborhoods of each other.
This paper is organized as follows. In Section 2, related works are introduced. Section 3 proposes a local low-rank approach (LLRA) to solve a matrix completion problem on an LDDM, and the convergence properties of the proposed algorithm are shown in Section 4. Finally, numerical examples are presented in Section 5 to illustrate that the proposed algorithm has a higher accuracy than other low-rank approaches.
2 Related works
Here, we focus on some matrix completion algorithms based on matrix rank minimization (MRM) on an unknown manifold, \(\mathcal {M}_{r}\). First, this paper introduces the algorithms for the case where \( \mathcal {M}_{r} \) is an r-dimensional linear subspace in Section 2.1; then, Section 2.2 shows the algorithms using the polynomial kernel for a UoLS and an LDDM.
2.1 Matrix rank minimization for linear subspace
Most algorithms for matrix completion deal with the case where the manifold \( \mathcal {M}_{r} \) is an LDLS [1–3, 5]. In this case, since the dimension of r is unknown, they formulate a matrix completion problem as the following MRM problem to simultaneously estimate r and to restore X.
Since this problem is generally NP-hard, several surrogate functions such as the nuclear norm [1] and truncated nuclear norm [5], Schatten norm [3] have been proposed. These algorithms recover X well if X can be approximated as a low-rank matrix.
2.2 High-rank matrix completion with the kernel method
To recover a high-rank matrix with columns belonging to an UoLS or an LDDM, some algorithms have been proposed that minimize the rank of its kernel matrix [15, 18–21].
In [15], the authors focused on a matrix completion problem on an union of d linear subspaces \(\bigcup _{k=1}^{d} \mathcal {S}_{k} \), where \( \mathcal {S}_{k} \) denotes an LDLS of dimension r or lower. Since the matrix rank is high or full in this problem, the MRM approach does not achieve high performance. To solve this matrix completion problem, an algebraic variety model approach was proposed based on the fact that the monomial features of each column vector (\( \boldsymbol {x}_{i} \in \bigcup _{k=1}^{d} \mathcal {S}_{k} \)) belong to a LDLS.
Here, the monomial features of x are defined as:
α=[α1 ⋯ αM] denotes a multi-index of non-negative integers, xα is defined as \( \boldsymbol {x}^{\boldsymbol {\alpha }}=x_{1}^{\alpha _{1}}\cdots x_{M}^{\alpha _{M}}, |\boldsymbol {\alpha }| = \alpha _{1}+\cdots +\alpha _{M} \).
Since \( \boldsymbol {x}\in \bigcup _{k=1}^{d} \mathcal {S}_{k} \) if and only if \( \prod _{k=1}^{d}(\boldsymbol {x}^{T} \boldsymbol {a}_{k}) =0 \) (where ak denotes a vector in the orthogonal complement of \( \mathcal {S}_{k} \)), there exists a vector \( \boldsymbol {c}\in {\mathbb {R}^{\binom {M+d}{d}}}\) that satisfies cTψd(x)=0. Hence, the matrix ψd(X)=[ψd(x1) ψd(x2) ⋯ ψd(xN)] is rank deficient, and the high-rank matrix completion problem is formulated as follows:
This problem can be solved efficiently by replacing ψd(X) with a polynomial kernel-gram matrix and by using the Schatten norm-minimization algorithm [3]. The details are presented in [15].
Another approach to the high-rank matrix completion problem was proposed in [19–21]. The matrix ψd(X) is rank deficient when each column vector xi is given by a polynomial mapping of latent features \( \boldsymbol {y}_{i} \in {\mathbb {R}^{r}} (r \ll M < N)\) denoted by:
with polynomial coefficients \( \boldsymbol {U}_{p}\in {\mathbb {R}^{{M}\times {\binom {r+p}{p}}}} \) and order p≪M, because R=rank(ψd(X)) satisfies:
and \( R < \binom {M+d}{d}\) if r,p≪M<N. Therefore, the matrix ψd(X) can be approximated by a low-rank matrix. [19–21] proposed a high-rank matrix-completion algorithm using matrix factorization in the same way as [15]; however, this algorithm requires a large number of observed entries and does not recover the matrix when only a small number are present. The algorithm restores [ψd(x1)⋯ψd(xN)] uniquely if the sample number N and the sampling rate \( q=\frac {|\Omega |}{MN} \) satisfy the inequality:
For example, when \( p=3, r=5, m=100, d=2, \binom {r+pd}{pd} = 462 \) and \( \binom {M+d}{d} = 5151 \), although the ratio \( \binom {r+pd}{pd}/\binom {M+d}{d} \ll \binom {r+p}{p}/M = 0.56 \), we need N≥5982 for q=0.4 and N≥1362465 for q=0.3. Hence, we expect that the matrix-completion accuracy will worsen when p and r are high and N is small.
Therefore, this paper proposes a new approach that makes use of neither monomial features nor the kernel method, but which is rather based on the assumption that an LDDM can be approximated locally as an LDLS to achieve high completion accuracy with a small q and too few samples N.
3 Methods
3.1 Local low-dimensional model
First, in order to consider how the LDDM is structured when the matrix X is given, this paper assumes that some columns xj are approximated by a vector within a set of tangent vectors x+U(x) at x in the LDDM \( \mathcal {M}_{r} \). Here, U(x) is defined as:
ε>0 denotes the radius of an r-dimensional hyperball, and J(x) denotes a Jacobian matrix defined as:
\( \boldsymbol {\phi }_{\lambda } : {\mathcal {U}}_{\lambda } \mapsto \mathcal {U}^{\prime }_{\lambda } \) and \( \boldsymbol {\phi }_{\lambda }^{-1} = \left [\phi ^{-1}_{\lambda,1} \ \cdots \ \phi ^{-1}_{\lambda,M}\right ]^{T} : \mathcal {U}^{\prime }_{\lambda } \mapsto {\mathcal {U}}_{\lambda } \) denote a chart and its inverse for an index λ, with an open set \( \mathcal {U}_{\lambda } \) that includes x satisfying \( \bigcup _{\lambda } \mathcal {U}_{\lambda } = \mathcal {M}_{r} \) and \( \mathcal {U}^{\prime }_{\lambda } \subset {\mathbb {R}^{r}}\). Then, we consider that each xj in a set {x1,x2,⋯,xN} can be approximated by a vector belonging to \(\bigcup _{i\neq j}(\boldsymbol {x}_{i}+U(\boldsymbol {x}_{i}))\) for all \(j\in \mathcal {I}\). In other words, we assume that we have the following non-empty-index set \( \mathcal {I}_{i} \) for \( i\in \mathcal {I} \) defined as:
where η>0 denotes the upper bound of the Euclidean distance between xj−xi and a vector zi,j∈U(xi). In this case, the rank of a matrix \( \boldsymbol {Z}_{i} =\left [\boldsymbol {z}_{i,j_{1}}\ \boldsymbol {z}_{i,j_{2}} \ \cdots \boldsymbol {z}_{i,j_{|\mathcal {I}_{i}|}}\right ] \) (where \( \left \{j_{1},j_{2},\cdots,j_{|\mathcal {I}_{i}|}\right \}= I_{i}\)) is less than or equal to r because of rank(J(xi))=r.
Figure 1 illustrates the construction of each zi,j. From the figure, it is apparent that the zi,j∈U(xi) can be obtained for suitable parameters ε and η. Therefore, the matrix-completion problem for an arbitrary LDDM (1) can be substituted with the problem of finding sets \( \mathcal {I}_{i}, \boldsymbol {Z}_{i} \) that satisfy (6) and the missing entries of the matrix X with the set of tangent vectors xi+U(xi) and given parameters ε and η.
Next, we consider how to find zi,j and \(\mathcal {I}_{i}\). To simplify the below explanation, we redefine a variable zi,j as follows:
where ei,j denotes an error vector satisfying \( \|\boldsymbol {e}_{i,j}\|_{2}^{2}\leq \eta \) and di,j∈{0,1} denotes a variable for which finding di,j is equivalent to finding \(\mathcal {I}_{i}\). In order to find a suitable solution for di,j, this paper formulates the following maximization problem:
where \( \boldsymbol {Z}_{i} \in {\mathbb {R}^{{M}\times {N}}}\) is a matrix whose jth column vector is zi,j. Since the problem (8) cannot be solved because of U(xi) when the LDDM \( \mathcal {M}_{r} \) is unknown (as is often the case in actual problems), this paper reformulates the constraint condition zi,j∈U(xi) as two constraint conditions : (1) rank(Zi)≤r, because the span of U(xi) is an r-dimensional linear subspace, and (2) \( \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}\leq \epsilon _{i} \) if di,j=1. Because it is difficult to estimate the radius of an ellipsoid since each J(xi) is arbitrary and unknown, this paper uses the Euclidean distance of xj−xi and gives the radius of the hyperball εi for each xi. Thus, this paper reformulates the problem (8) with the given parameters r and \( \{\epsilon _{i}\}_{i\in \mathcal {I}} \) as:
where the 2nd constraint condition is the same as \( \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}\leq \epsilon _{i} \) if di,j=1. Thus, we obtain the formulation for finding zi,j on each U(xi) and the set \( \mathcal {I}_{i} \) without understanding J(x). However, it is difficult to solve the problem (9) for \( \boldsymbol {X},\{\boldsymbol {Z}_{i}\}_{i\in \mathcal {I}}, \{d_{i,j}\}_{(i,j)\in \mathcal {I}^{2}} \) at the same time due to the condition rank(Zi)≤r. Actual applications may not be able to find a suitable dimension r. In order to solve this issue, the present paper explains how to obtain the solution using a MRM technique in Section 3.2.
3.2 Local low-rank approximation algorithm
First, we consider how to estimate the dimension of the LDDM, r, with an arbitrary matrix X. We can estimate r simply using a principal-component analysis if we obtain di,j; however, the lower the rank of the matrix Zi, the lower that the number of solutions to di,j=1 becomes for the solution of (9). It can be seen that there is a trade-off between the dimension r and the number of solutions to di,j=1. Therefore, this paper formulates the following problem:
where 0≤α≤1 denotes a given trade-off parameter, which is the ratio of the decreasing rank of Zi to the sum of di,j. Because solving the problem (10) is NP-hard due to rank(Zi), this paper reformulates the problem as one of relaxation:
where f is defined as:
Here, β,γ≥0 denote the given parameters, function ∥·∥∗ denotes the nuclear norm, \(\boldsymbol {1}_{N}\in {\mathbb {R}^{N}}\) denotes the vector whose elements are all 1, Di denotes a diagonal matrix whose diagonal elements (Di)j,j each equal di,j, and trace(Y) denotes the sum of all diagonal elements of Y.
Next, this paper presents a technique to solve the problem (11) using alternating optimization. Firstly, we consider how to solve the problem (11) for Zi and di,j with a fixed X. We repeat the following schemes until a termination condition is satisfied with respect to Zi and di,j:
where \( h_{\beta,\epsilon _{i}}(\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {z}_{i,j}) \) is defined as:
〈a,b〉 denotes the inner product of a and b,sat(c)= max(0, min(1,c)), and \(\boldsymbol {\mathcal {T}}_{\tau }\) denotes the matrix-shrinkage operator for the nuclear norm-minimization problem [1]. Each step of (13) minimizes the objective function (12) for Zi and di,j. Then, we consider minimizing the objective function (12) for X with fixed Zi and di,j. Since the objective function (12) is quadratic for vec(X)=x, we obtain the following solution to the quadratically constrained quadratic program for a given Zi and di,j:
where \( \boldsymbol {I}_{M,M} \in {\mathbb {R}^{{M}\times {M}}} \) is the identity matrix, ⊗ is the Kronecker product of two matrices, and \( \boldsymbol {L} \in {\mathbb {R}^{{N}\times {N}}}\) is a graph Laplacian defined as:
\( \boldsymbol {c}=\left [\boldsymbol {c}_{1}^{T} \ \boldsymbol {c}_{2}^{T} \ \cdots \ \boldsymbol {c}_{M}^{T}\right ]^{T}\in {\mathbb {R}^{MN}} \) is defined as:
for l=1,⋯,M. ⊙ denotes the Hadamard product. Thus, we can alternately optimize for each of Zi,di,j and X in the problem (11).
3.3 Truncated nuclear norm-minimization approach
The solution to the problem (11) is obtained by minimizing the function (12). However, the norm of the solution X might be below the true value, since nuclear norm minimization decreases not only the (r+1)th biggest singlular values, but also the 1st to the rth biggest singular values. Therefore, this paper reformulates the problem and the evaluation function as follows:
where r∈{0,1,⋯,M} is a given parameter and the function ∥Z∥∗,r represents the truncated nuclear norm, which is defined with the kth biggest singular value σk of Z. The details of the truncated nuclear norm and the optimization technique are given in Appendix. Note that the truncated nuclear norm with r=0 is equal to the nuclear norm. In this case, the problem (11) is same as the problem (16). When the variables X and Di are constant, the optimal solution for each Zi is obtained by \( \boldsymbol {Z}_{i}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{(\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T})\boldsymbol {D}_{i}\right \}. \) In the same way, to solve the problem (11), this paper describes Algorithm 1 using iterative partial matrix shrinkage (IPMS) [5] for the problem (16), which contains the algorithm for (11). Here, \( \boldsymbol {0}_{M,N} \in {\mathbb {R}^{{M}\times {N}}}\) denotes a zero matrix, η1,η2 denote lower limits for the termination conditions \( \|\boldsymbol {D}^{\text {old}}_{i}-\boldsymbol {D}_{i}\|_{F}/\|\boldsymbol {D}_{i}\|_{F}\leq \eta _{1} \), and ∥Xold−X∥F/∥X∥F≤η2 and β(k),γ(k),r(k) denote given parameters that satisfy 0<β(0)≤β(1)≤⋯≤βmax,γ0≥γ1≥⋯≥γmin≥0,0≤r0≤r1≤⋯≤rmax≤M.
4 Convergence analysis
This section presents the convergence property of Algorithm 1.
First, let us define the following schemes with regard to the tth iteration of the second iteration statements in Algorithm 1:
u(t1,t2) behaves as:
for t,t1,t2≥0 and a given \( \boldsymbol {X} \in {\mathbb {R}^{{M}\times {N}}}\).
Lemma 1
For t≥0 and a given \(\boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}}\), the \( d_{i,j}^{(t)} \) generated by the update schemes (18) satisfies:
Proof
u(t1,t2) satisfies u(t,t)≥u(t+1,t)≥u(t+1,t+1)≥⋯≥−βN2 for t≥0, since \(d_{i,j}^{(t)}= h_{\beta,\epsilon _{i}}\left (\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {z}_{i,j}^{(t)}\right)\) is the closed-form optimal solution of the convex quadratic-minimization problem with linear constraints for fixed \(\boldsymbol {Z}_{i}^{(t)}\), and \(\boldsymbol {Z}_{i}^{(t+1)}\) represents the optimal solution for fixed \(d_{i,j}^{(t)}\) (from Theorem 1 of [5]). Each \( d_{i,j}^{(t+1)} \) satisfies the following KKT condition of problem (16) with \(\left (\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}-\epsilon _{i}\right){d_{i,j}^{(t+1)}}\leq 0,{d_{i,j}^{(t)}}-1\leq 0,-{d_{i,j}^{(t+1)}}\leq 0\) for \( (i,j)\in \mathcal {I}^{2}\):
where \(\mu _{1,i,j}^{(t+1)},\mu _{2,i,j}^{(t+1)}\) and \(\mu _{3,i,j}^{(t+1)}\) denote KKT multipliers for \( d_{i,j}^{(t+1)} \). Therefore, u(t1,t2) satisfies:
Since \(\left (\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}-\epsilon _{i}\right){d_{i,j}^{(t+1)}}=0\) if \(\mu _{1,i,j}^{(t+1)}>0\), \( {d_{i,j}^{(t+1)}}-1 =0 \) if \( \mu _{2,i,j}^{(t+1)}>0 \) and \( -{d_{i,j}^{(t+1)}} = 0 \) if \( \mu _{3,i,j}^{(t+1)}>0 \), and each \( d_{i,j}^{(t)} \) satisfies the constraint condition:
Therefore, each sequence \( \{d_{i,j}^{(t)}\} \) converges to a limit point \(\bar {d}_{i,j}\) if u(0,0)<∞, because \({d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}} \rightarrow 0\) when t→∞ if \( \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}>0 \) and \( {d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}} = 1-1 = 0 \), even if \(\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}=0\) for \(t\geq 0, (i,j)\in \mathcal {I}^{2}\). Then, each sequence \( \{\boldsymbol {Z}_{i}^{(t)}\} \) converges to a limit point \(\bar {\boldsymbol {Z}}_{i}\) because each \( \boldsymbol {Z}_{i}^{(t+1)} \) can be obtained by the soft-thresholding operator using fixed \( d_{i,j}^{(t)} \) for \(t\geq 0, i\in \mathcal {I}\). □
Lemma 2
If β≥εi, the optimal solution of (17) under the constraint conditions for di,j and Zi can be obtained by initializing \(\boldsymbol {Z}_{i}^{(0)}\) as \(\boldsymbol {Z}_{i}^{(0)}=\boldsymbol {0}_{M,N}\) and updating \(d_{i,j}^{(0)}\) and \(\boldsymbol {Z}_{i}^{(1)}\) using the update schemes (18) for a given \(\boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}}\).
Proof
From Theorem 1 of [5], any \( \boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}} \) and each optimal solution \(\bar {\boldsymbol {Z}}_{i} \) and \( \bar {\boldsymbol {D}}_{i} \) satisfies \(\bar {\boldsymbol {Z}}_{i}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\bar {\boldsymbol {D}}_{i}\right \} \). For a given di,j≥0, a matrix \(\boldsymbol {Z}_{i}=\mathcal {T}_{r,\gamma }\left \{\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right){\boldsymbol {D}}_{i}\right \}\) satisfies 0≤〈xj−xi,zi,j〉 because, when di,j>0,
Here, yi,j denotes the jth column of \(\boldsymbol {Y}_{i}=\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} =\boldsymbol {U}\text {diag}(\boldsymbol {\sigma })\boldsymbol {V}^{T}\) and σ,U,V denotes the singular values and vectors of \(\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i}\); when di,j=0,〈yi,j,zi,j〉=0 because of yi,j=0M, where \( \boldsymbol {0}_{M}\in {\mathbb {R}^{M}} \) denotes the zero vector. Then, \(\bar {d}_{i,j}\) satisfies:
because \(\beta \geq \epsilon _{i}\geq \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}\), which does not depend on \(\bar {\boldsymbol {Z}}_{i}\). Therefore, \(d_{i,j}^{(0)}=h_{\beta,\epsilon _{i}}(\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {0}_{M})\in \{0,1\}\) and \(\boldsymbol {Z}_{i}^{(1)}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}){D}_{i}^{(0)}\right \}\) is the optimal solution for (16). □
Next, let us define the following schemes with regard to the kth iteration of the first-iteration statements in Algorithm 1 with β(k),γ(k),r(k) for k≥0,
where \( \bar {d}_{i,j} \) and \( \bar {\boldsymbol {Z}}_{i} \) are the tth elements of the sequences obtained by the schemes (18) with β(k),γ(k),r(k),X(k), and vector c(k) as:
Here, \( \tilde {\boldsymbol {D}}^{(k)}\in {\mathbb {R}^{{N}\times {N}}}\) and \(\tilde {\boldsymbol {Z}}_{l}^{(k)} \in {\mathbb {R}^{{N}\times {N}}}\) denote matrices defined as \((\tilde {\boldsymbol {D}})_{i,j}^{(k)}= d_{i,j}^{(k)}\) and \((\tilde {\boldsymbol {Z}}_{l}^{(k)})_{i,j}= (\boldsymbol {Z}_{i}^{(k)})_{l,j}\) for \( (i,j)\in \mathcal {I}^{2}\), and the graph Laplacian L(k) is:
where \(\hat {\boldsymbol {D}}^{(k)}\in {\mathbb {R}^{{N}\times {N}}}\) denotes a matrix whose every element is given by \(\left (\hat {\boldsymbol {D}}^{(k)}\right)_{i,j}={d_{i,j}^{(k)}}^{2}+{d_{j,i}^{(k)}}^{2}\).
Lemma 3
For k≥0,L(k) satisfies kernel(L(k))⊇kernel(L(k+1)).
Proof
Since a vector a∈kernel(L(k)) satisfies:
kernel(L(k)) is written as:
Since \(d_{i,j}^{(k+1)}\) and \(d_{i,j}^{(k)}\) generated by the schemes (18) and (19) satisfy \( d_{i,j}^{(k+1)}>0 \) when \( d_{i,j}^{(k)}>0 \), L(k) satisfies kernel(L(k))⊇kernel(L(k+1)). □
Now, let us describe the properties of the sequences generated by Algorithm 1 \(\left \{\boldsymbol {X}^{(k)}\right \}, \left \{\boldsymbol {Z}_{i}^{(k)}\right \}, \left \{d_{i,j}^{(k)}\right \}\). We define the evaluation function:
and replace the linear-constraint condition (X(k))m,n=(X(0))m,n for (m,n)∈Ω with Ax(k)=b, where \( \boldsymbol {b}\in {\mathbb {R}^{|\Omega |}} \) denotes a vector whose elements are observed values {(X(0))m,n}(m,n)∈Ω and A∈{0,1}|Ω|×MN denotes a selector matrix.
Theorem 1
The sequences \(\left \{\boldsymbol {X}^{(k)}\right \}, \left \{\boldsymbol {Z}_{i}^{(k)}\right \}\) and \(\left \{d_{i,j}^{(k)}\right \}\) converge to the limit points \( \bar {\boldsymbol {X}}, \bar {\boldsymbol {Z}}_{i}\), and \( \bar {d}_{i,j} \) under repetition of the iteration schemes of (19) when \(\text {kernel}\left (\tilde {\boldsymbol {L}}^{(0)}\right) \cap v(\boldsymbol {A}) = \{\boldsymbol {0}_{MN}\} \), where \(\tilde {\boldsymbol {L}}^{(k)}=\boldsymbol {L}^{(k)}\otimes \boldsymbol {I}_{M,M}\).
Proof
The scheme (15) can be written as:
where \(\boldsymbol {Q}_{i,j}\in {\mathbb {R}^{{M}\times {MN}}}\) denotes a matrix defined as \(\boldsymbol {Q}_{i,j}=\boldsymbol {q}_{i,j}^{T}\otimes \boldsymbol {I}_{M,M}\) and \(\boldsymbol {q}_{i,j}\in {\mathbb {R}^{N}}\) is defined such that the ith element is 1, the jth element is −1, and the others are 0 (Qi,j satisfies \( \|\boldsymbol {Q}_{i,j}\boldsymbol {x}\|_{2}^{2}=\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2} \) for \( \boldsymbol {x}\in {\mathbb {R}^{MN}} \)). Since x(k+1) satisfies the following KKT condition for v(k,k+1,k,k):
where λ(k+1) and \(\mu _{i,j}^{(k+1)}\) denote the KKT multipliers, v(k1,k2,k3,k4) satisfies:
where the second equality uses the fact that Ax(k)=b. Since \(\|\boldsymbol {Q}_{i,j}\boldsymbol {x}^{(k+1)}\|_{2}^{2}=\epsilon _{i}\) when \(\mu _{i,j}^{(k+1)}>0\),
The second inequality uses:
Obviously, v(k,k+1,k,k)≥v(k+1,k+1,k,k) because the parameters {β(k),γ(k),r(k)} decrease the objective function (17), and v(k+1,k+1,k,k)≥v(k+1,k+1,k+1,k+1) from Lemma 1. Since the sequence {v(k,k,k,k)} generated by (19) converges to a limit point because of:
x(k)−x(k+1)→0MN when k→∞ and v(0,0,0,0)<∞ because each L(k) satisfies \(\text {kernel}\left (\tilde {\boldsymbol {L}}^{(k)}\right) \cap \text {kernel}(\boldsymbol {A}) = \{\boldsymbol {0}_{mn}\} \) for k≥0 if \(\text {kernel}\left (\tilde {\boldsymbol {L}}^{(0)}\right) \cap \text {kernel}(\boldsymbol {A}) = \{\boldsymbol {0}_{MN}\} \) from Lemma 3. X(k) reaches a limit point \(\bar {\boldsymbol {X}}\); then, the sequence \(\left \{\boldsymbol {Z}_{i}^{(k)}\right \}\) and \(\left \{d_{i,j}^{(k)}\right \}\) converges to limit points \(\bar {\boldsymbol {Z}}_{i}\) and \(\bar {d}_{i,j}\) with a fixed \(\bar {\boldsymbol {X}}\) from Lemma 1. □
Finally, some improvements to Algorithm (1) are offered in this section. First, the dimension of the LDDM is unknown in actual applications, although Algorithm 1 requires a suitable r. In order to solve this issue, we adopt a method that estimates the dimension r based on the ratio of the singular value σr/σ1, just as [5] did for each column \( i\in \mathcal {I} \). Second, we consider ways to reduce the computational complexity. Two key possibilities are considered: one is to ignore the quadratic-constraint condition \( \left (\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}-\epsilon _{i}\right)d_{i,j}\leq 0 \) when we update X and the other is to update X for only the columns in the ith neighborhoods, for example, by minimizing the only ith Frobenius norm term of (17) \( \left \|(\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T})\boldsymbol {D}_{i}-\boldsymbol {Z}_{i}\right \|_{F}^{2} \) with regard to the column xi, which is expected to work like a stochastic gradient-descent algorithm. Furthermore, this paper utilizes the parameter β= maxiε(i) because the update schemes (18) yield limit points for Zi and di,j only once for each \( i\in \mathcal {I} \) from Lemma 2. Thus, this paper proposes a heuristic algorithm for reducing the calculation time, as shown in Algorithm 2. There, the parameters satisfy 1>α(0)≥α(1)≥⋯≥αmin>0 for k=0,1,⋯,kmax and δ>0, just as in [5].
We consider here the time and space complexities of Algorithm 2. The major computational cost of Algorithm 2 is derived from computing the singular value decomposition of \(\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} \) for all i=1,2,⋯,N at each iteration. For simplicity, this paper assumes that the number of non-zero vectors of \(\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} \) is M for each iteration and each i. Then, since the algorithm requires the singular value decomposition of the M×M matrix, the time and space complexities of Algorithm 2 are O(M3N) and O(M2) for each iteration. As written in [20], since the method VMC [15] requires the time complexity O(N3+MN2) and the space complexity O(N2), the time and space complexities of Algorithm 2 are lower than those of VMC when the numbers of rows M and columns N satisfy M3<N2. Hence, Algorithm 2 is effective for datasets such as those used in Section 5.2.
5 Results and discussion
5.1 Synthetic data
This section presents several numerical examples for the matrix completion problem (1). In this section, each ith column of X(0) is generated by \( \boldsymbol {\mathcal {F}}_{p}: {\mathbb {R}^{r}} \mapsto {\mathbb {R}^{M}} \) with mapping function (3) as:
Using \( \boldsymbol {U}_{p}\in {\mathbb {R}^{{M}\times {\binom {r+p}{p}}}} \) and \( \boldsymbol {Y}^{(0)}=\left [\boldsymbol {y}_{1}^{(0)} \ \boldsymbol {y}_{2}^{(0)}\ \cdots \ \boldsymbol {y}_{N}^{(0)}\right ] \in {\mathbb {R}^{{r}\times {N}}}\) generated by an i.i.d. continuous uniform distribution whose supports are [−0.5,0.5] and [−1,1], the elements of Y(0) are normalized as max|(Y(0))i,j|=1. The index set Ω is generated using the Bernoulli distribution with the given probability q, for which an index (i,j) belongs to Ω. This paper uses relative recovery error as:
to evaluate each algorithm. All numerical experiments were run in MATLAB 2017b on a PC with an Intel Core i7 3.1 GHz CPU, 8 GB of RAM, and no swap memory.
This paper applies some low-rank matrix completion algorithms including singular value thresholding (SVT) [1], the fixed-point continuation algorithm (FPCA) [2], the short IRLS-0 (sIRLS-0) method [3], IPMS [5], the nonlinear matrix completion method VMC [15], and the proposed LLRASGD method to several matrix completion problems with M=100,N=4,000, and d=3,5 for (20). A maximum iteration number of kmax=1000 is used for LLRA, IPMS, sIRLS-0, and SVT, and the termination condition is ∥X(k)−X(k+1)∥F/∥X(k+1)∥F≤10−5 for all algorithms. The parameters for LLRASGD and IPMS are given as \( \alpha ^{(k)}=10^{-\frac {4k}{k_{\text {max}}}} \) and δ=10−2; those for SVT are \( \tau ^{(k)}=10^{-2}\sigma _{1}^{(k)} \); those for sIRLS-0 and VMC are \( \gamma ^{(k)}=10^{2-\frac {6k}{k_{\text {max}}}} \); those for VMC are p=0.5 and d=3; and those for FPCA are τ=1 and \( \mu ^{(k)}=(0.25)^{k} \geq \bar {\mu }=10^{-8} \). The condition \( \sigma _{l}^{(k)}\geq 10^{-2}\sigma _{1}^{(k)} \) is used to choose r for FPCA in this paper. We set the initial value of {(X)m,n}(m,n)∉Ω to 0 for SVT, FPCA, sIRLS-0, and IPMS. The values X and εi are estimated using IPMS for VMC and LLRASGD such that the total number satisfying the condition \( \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}\leq \epsilon _{i} \) equals 50 with an estimated value of X using IPMS for LLRASGD.
The results are shown in Tables 1, 2, and 3 for q∈{0.2,0.3,0.4} and r∈{2,3,4,5,6}. As can be seen, estimation accuracy of LLRASGD is better than the others for r=5,6,q=0.2,0.3,0.4 and d=3,5,7, and r=3,4,5,6 and q=0.2 especially. Figures 2, 3, and 4 compare all algorithms with q=0.3. In Figs. 2 and 3, the recovery errors of LLRASGD tend not to decay more than other algorithms. From this result, the proposed method is more effective for the case in which the missing rate or the latent dimension is high.
5.2 CMU motion capture data
This paper considers the matrix completion on motion capture data, which consists of time-series trajectories of human motions such as running and jumping. Similar to [15], this paper uses the trial #6 of subject #56 of the CMU motion capture dataset. The data has measurements from M=62 sensors at 6784 time instants, which the data matrix is known as high-rank matrix. In this experiment, the sequence is downsampled by factor 2, which the data matrix has M=62 rows and N=3392 columns. Then, the elements of the data matrix were randomly observed with the ratio q∈{0.1,0.2,0.3,0.4}, and this paper applied the matrix completion algorithms with the same parameters which is used in the Section 5.1.
The average recovery errors for 10 trials are shown in Fig. 5. Similar to the results on synthetic data, the estimation accuracy of LLRASGD is better than the others. Especially, the recovery errors of LLRASGD are much lower than the others when the missing ratio is very high (such as q=0.1,0.2). From these results, the proposed method is more effective for not only synthetic data but also real-world dataset.
The average computational time costs for all observed ratio q∈{0.1,0.2,0.3,0.4} are shown in Table 4. This result indicates that the computation time of LLRASGD is about 200 to 500 times longer than that of the conventional MRM methods, and the computation time of VMC is about 2.4 times longer than that of LLRASGD for the same number of iterations. However, VMC and LLRASGD have sufficiently high estimation accuracy even with a small number of iterations. Figure 6 shows the results of VMC and LLRASGD for the maximum iteration number of kmax∈{10,20,40,100,1000} in the observed ratio q=0.2. As can be seen in Fig. 6, the recovery error of LLRASGD converges sufficiently in kmax=40. In this maximum iteration number, although the computational time of LLRASGD is about 16 times longer than that of IPMS, the recovery error of LLRASGD is less than half that of IPMS. We can also see that the recovery error of LLRASGD is less than that of VMC for all kmax∈{10,20,40,100,1000}.
6 Conclusion
This paper proposed a local low-rank approach (LLRA) for a matrix-completion problem in which the columns of the matrix belong to an LDDM. The convergence properties of this approach were also presented. The proposed method is based on the idea of tangent hyperplanes of dimension equal to that of the LDDM with respect to each column of the matrix. It is assumed that each hyperplane is of low dimension and that the sum of the rank of each local submatrix with respect to each column belonging to the set of nearest neighborhoods of each column is minimized. Numerical examples show that the proposed algorithm offers higher accuracy for matrix completion than other algorithms in the case where each column vector is given by a pth order polynomial mapping of a latent feature. In particular, the proposed method is suitable when the order p and the dimension of the latent space are high.
7 Appendix
In this section, this paper introduces the truncated nuclear norm and the minimization technique by IPMS [5].
The truncated nuclear norm ∥Z∥∗,r is defined with the kth biggest singular value σk of Z as:
The truncated nuclear norm is used as the substitution function of matrix rank, and the solution of \(\frac {1}{2}\|\boldsymbol {Y}-\boldsymbol {Z}\|_{F}^{2} +\gamma \|\boldsymbol {Z}\|_{*,r} \) with a given matrix Y and a given parameter γ>0 can be solved as follows:
where \(\boldsymbol {\mathcal {T}}_{r,\gamma }\) denotes the matrix-shrinkage operator defined as:
and (c)+= max(0,c) with regard to the singular value decomposition Y=Udiag(σ)VT.
In the matrix completion problem (2), the IPMS algorithm solves the relaxation problem by iterating the following update schemes:
Since the truncated nuclear norm requires the value of r regarding with a matrix rank, the IPMS estimates a matrix rank r during iterations by using the scheme:
where 0≤α<1 is a given constant. The details of the IPMS algorithm are written in [5].
Availability of data and materials
Please contact the author for data requests.
Abbreviations
- LDDM:
-
Low-dimensional differentiable manifold
- MRM:
-
Matrix rank minimization
- LDLS:
-
Low-dimensional linear subspace
- UoLS:
-
Union of linear subspaces
- VMC:
-
Variety-based matrix completion
- LLRA:
-
Local low-rank approach
- SVT:
-
Singular value thresholding
- FPCA:
-
Fixed-point continuation algorithm
- IRLS:
-
Iterative reweighted least squares
- IPMS:
-
Iterative partial matrix shrinkage
References
J. F. Cai, E. J. Candés, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim.20(4), 1956–1982 (2010).
D. Goldfarb, S. Ma, Convergence of fixed point continuation algorithms for matrix rank minimization. Found. Comput. Math. 11(2), 183–210 (2011).
K. Mohan, M. Fazel, Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res. (JMLR). 13(1), 3441–3473 (2012).
D. Zhang, Y. Hu, J. Ye, X. Li, X. He, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Matrix completion by truncated nuclear norm regularization, (2012), pp. 2192–2199.
K. Konishi, K. Uruma, T. Takahashi, T. Furukawa, Iterative partial matrix shrinkage algorithm for matrix rank minimization. Signal Process. 100:, 124–131 (2014).
J. Gotoh, A. Takeda, K. Tono, DC formulations and algorithms for sparse optimization Problems. J. Math. Program. 169(1), 141–176 (2018).
X. Guan, C. T. Li, Y. Guan, Matrix factorization with rating completion: an enhanced SVD model for collaborative filtering recommender systems. IEEE Access. 5:, 27668–27678 (2017).
M. Verhaegen, A. Hansson, N2SID: nuclear norm subspace identification of innovation models. Autom.72:, 57–63 (2016).
K. H. Jin, J. C. Ye, Annihilating filter-based low-rank Hankel matrix approach for image inpainting. IEEE Trans. Image Process.24(11), 3498–3511 (2015).
Q. Zhao, D. Meng, Z. Xu, W. Zuo, Y. Yan, L1-norm low-rank matrix factorization by variational Bayesian method. IEEE Trans. Neural Netw. Learn. Syst.26(4), 825–839 (2015).
B. Eriksson, L. Balzano, R. Nowak, High rank matrix completion. Int. Conf. Artif. Intell. Stat.22:, 373–381 (2012).
C. Yang, D. Robinson, R. Vidal, in Proc. of the 32th Int. Conf. on Machine Learning (PMLR), 37. Sparse subspace clustering with missing entries (PMLRLille, 2015), pp. 2463–2472.
C. -G. Li, R. Vidal, A structured sparse plus structured low-rank framework for subspace clustering and completion. IEEE Trans. Signal Process. 64(24), 6557–6570 (2016).
E. Elhamifar, in Proc. 28th Adv. Neural Inf. Process. Syst. High-rank matrix completion and clustering under selfexpressive models (Curran Associates Inc.Barcelona, 2016), pp. 73–81.
G. Ongie, R. Willett, R. D. Nowak, L. Balzano, in Proc. of the 34th Int. Conf. on Machine Learning (PMLR), 70. Algebraic variety models for high-rank matrix completion (PMLRSydney, 2017), pp. 2691–2700.
G. Ongie, L. Balzano, D. Pimentel-Alarcón, R. Willett, R. D. Nowak, Tensor methods for nonlinear matrix completion. arXiv preprint arXiv:1804.10266 (2018). https://arxiv.org/abs/1804.10266.
R. Vidal, Y. Ma, S. Sastry, Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005).
X. Alameda-Pineda, E. Ricci, Y. Yan, N. Sebe, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Recognizing emotions from abstract paintings using non-linear matrix completion (IEEELas Vegas, 2016), pp. 5240–5248.
J. Fan, T. W. S. Chow, Non-linear matrix completion. Pattern Recogn.77:, 378–394 (2018).
J. Fan, M. Udell, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Online high rank matrix completion (IEEELong Beach, 2019), pp. 8682–8690.
J. Fan, Y. Zhang, M. Udell, in Proc. AAAI, 34. Polynomial matrix completion for missing data imputation and transductive learning,” (AAAI PressNew York, 2020), pp. 3842–3849.
J. Fan, J. Cheng, Matrix completion by deep matrix factorization. Neural Netw.98:, 34–41 (2018).
B. Schölkopf, A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2002). https://books.google.co.jp/books?id=y8ORL3DWt4sC&hl=ja&source=gbs_book_other_versions.
S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Sci.290(5500), 2323–2326 (2000).
M. Winlaw, D. L. Samimi, A. Ghodsi, Robust locally linear embedding using penalty functions. Int. Joint Conf. Neural Netw., 2305–2312 (2011).
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions that helped improve the quality of this manuscript.
Funding
This work was supported by the JSPS KAKENHI Grant Number JP19H02163.
Author information
Authors and Affiliations
Contributions
All authors have contributed equally. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sasaki, R., Konishi, K., Takahashi, T. et al. Local low-rank approach to nonlinear matrix completion. EURASIP J. Adv. Signal Process. 2021, 11 (2021). https://doi.org/10.1186/s13634-021-00717-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634-021-00717-7