 Research
 Open Access
 Published:
Local lowrank approach to nonlinear matrix completion
EURASIP Journal on Advances in Signal Processing volume 2021, Article number: 11 (2021)
Abstract
This paper deals with a problem of matrix completion in which each column vector of the matrix belongs to a lowdimensional differentiable manifold (LDDM), with the target matrix being high or full rank. To solve this problem, algorithms based on polynomial mapping and matrixrank minimization (MRM) have been proposed; such methods assume that each column vector of the target matrix is generated as a vector in a lowdimensional linear subspace (LDLS) and mapped to a pth order polynomial and that the rank of a matrix whose column vectors are dth monomial features of target column vectors is deficient. However, a large number of columns and observed values are needed to strictly solve the MRM problem using this method when p is large; therefore, this paper proposes a new method for obtaining the solution by minimizing the rank of the submatrix without transforming the target matrix, so as to obtain high estimation accuracy even when the number of columns is small. This method is based on the assumption that an LDDM can be approximated locally as an LDLS to achieve high completion accuracy without transforming the target matrix. Numerical examples show that the proposed method has a higher accuracy than other lowrank approaches.
Introduction
This paper deals with the following completion problem for a matrix \( \boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}} \) on a lowdimensional differentiable manifold (LDDM) \(\mathcal {M}_{r}\):
where the (m,n)th element of a matrix is denoted by \((\cdot)_{m,n}, \mathcal {I} \) is an index set defined as \( \mathcal {I}=\{1,2,\cdots,N\} \), and \( \mathcal {M}_{r} \subset {\mathbb {R}^{M}}, \Omega \), and X^{(0)} denote an unknown rdimensional differential manifold, a given index set, and a given observed matrix, respectively. In this paper, the LDDM \(\mathcal {M}_{r}\) satisfies the following condition: on an open set \(\mathcal {U}_{\lambda }\) satisfying \(\bigcup _{\lambda } \mathcal {U}_{\lambda } = \mathcal {M}_{r}\), there exists a differentiable homeomorphism \( \boldsymbol {\phi }_{\lambda } : {\mathcal {U}}_{\lambda } \mapsto \mathcal {U}^{\prime }_{\lambda } \), where \(\mathcal {U}^{\prime }_{\lambda }\) denotes an open set of \({\mathbb {R}^{r}}\). If \(\mathcal {M}_{r}\) is an unknown lowdimensional linear subspace (LDLS), then this is a lowrank matrix completion problem. Many algorithms have been proposed [1–6] to obtain solutions to this problem with high estimation accuracy. The lowrank matrix completion problem has various applications in the field of signal processing, including collaborative filtering [7], loworder model fitting and system identification [8], image inpainting [9], and humanmotion recovery [10], all of which are formulated as signal recovery or estimation problems. However, in most practical applications, the column vectors of a matrix do not belong to an LDLS, i.e., \(\mathcal {M}_{r}\) is not an LDLS. Therefore, these algorithms do not achieve high performance. As an example, a matrix is of high rank when its column vectors lie on a union of linear subspaces (UoLS), which the column space of the matrix is high dimension even when the dimension of the linear subspace is low. In this case, several methods have been proposed to solve this highrank matrix completion problem [11–16], all of which are based on subspace clustering [17]. In particular, [15] proposed an algebraic variety approach known as varietybased matrix completion (VMC), which is based on the fact that the monomial features of each column vector belong to an LDLS when the column vectors belong to a UoLS. This approach solves the rank minimization problem about the Gram matrix of the monomial features by relaxing the problem into one of rank minimization of a polynomial kernel matrix. Unfortunately, these algorithms recover a matrix only when \({\mathcal {M}}_{r}\) can be approximately divided into some LDLSs and do not work well otherwise.
To solve the matrix completion problem on a general LDDM, some nonlinear methods have been proposed [18–22]. In particular, Fan et al. [19–21] have proposed a method based on a kind of kernel principal component analysis [23] that assumes that the dimension of the subspace spanned by the column vectors mapped nonlinearly is low. They formulate the matrix completion problem as a lowrank approximation problem of the kernel matrix, in common with [15]; however, they require a large number of observed entries in the matrix to solve the problem, and the matrix completion accuracy declines when the number of observed entries is small.
In the present paper, a new method is proposed that uses neither the monomial features nor the kernel method to achieve high completion accuracy. Based on an idea similar to that of locally linear embedding [24, 25], this paper assumes that an LDDM can be approximated locally as a LDLS, because there are tangent hyperplanes whose dimension is equal to that of the manifold. The matrix completion problem is then formulated as one of minimizing the rank of the local submatrix of X whose columns are local nearest neighborhoods of each other.
This paper is organized as follows. In Section 2, related works are introduced. Section 3 proposes a local lowrank approach (LLRA) to solve a matrix completion problem on an LDDM, and the convergence properties of the proposed algorithm are shown in Section 4. Finally, numerical examples are presented in Section 5 to illustrate that the proposed algorithm has a higher accuracy than other lowrank approaches.
Related works
Here, we focus on some matrix completion algorithms based on matrix rank minimization (MRM) on an unknown manifold, \(\mathcal {M}_{r}\). First, this paper introduces the algorithms for the case where \( \mathcal {M}_{r} \) is an rdimensional linear subspace in Section 2.1; then, Section 2.2 shows the algorithms using the polynomial kernel for a UoLS and an LDDM.
Matrix rank minimization for linear subspace
Most algorithms for matrix completion deal with the case where the manifold \( \mathcal {M}_{r} \) is an LDLS [1–3, 5]. In this case, since the dimension of r is unknown, they formulate a matrix completion problem as the following MRM problem to simultaneously estimate r and to restore X.
Since this problem is generally NPhard, several surrogate functions such as the nuclear norm [1] and truncated nuclear norm [5], Schatten norm [3] have been proposed. These algorithms recover X well if X can be approximated as a lowrank matrix.
Highrank matrix completion with the kernel method
To recover a highrank matrix with columns belonging to an UoLS or an LDDM, some algorithms have been proposed that minimize the rank of its kernel matrix [15, 18–21].
In [15], the authors focused on a matrix completion problem on an union of d linear subspaces \(\bigcup _{k=1}^{d} \mathcal {S}_{k} \), where \( \mathcal {S}_{k} \) denotes an LDLS of dimension r or lower. Since the matrix rank is high or full in this problem, the MRM approach does not achieve high performance. To solve this matrix completion problem, an algebraic variety model approach was proposed based on the fact that the monomial features of each column vector (\( \boldsymbol {x}_{i} \in \bigcup _{k=1}^{d} \mathcal {S}_{k} \)) belong to a LDLS.
Here, the monomial features of x are defined as:
α=[α_{1} ⋯ α_{M}] denotes a multiindex of nonnegative integers, x^{α} is defined as \( \boldsymbol {x}^{\boldsymbol {\alpha }}=x_{1}^{\alpha _{1}}\cdots x_{M}^{\alpha _{M}}, \boldsymbol {\alpha } = \alpha _{1}+\cdots +\alpha _{M} \).
Since \( \boldsymbol {x}\in \bigcup _{k=1}^{d} \mathcal {S}_{k} \) if and only if \( \prod _{k=1}^{d}(\boldsymbol {x}^{T} \boldsymbol {a}_{k}) =0 \) (where a_{k} denotes a vector in the orthogonal complement of \( \mathcal {S}_{k} \)), there exists a vector \( \boldsymbol {c}\in {\mathbb {R}^{\binom {M+d}{d}}}\) that satisfies c^{T}ψ_{d}(x)=0. Hence, the matrix ψ_{d}(X)=[ψ_{d}(x_{1}) ψ_{d}(x_{2}) ⋯ ψ_{d}(x_{N})] is rank deficient, and the highrank matrix completion problem is formulated as follows:
This problem can be solved efficiently by replacing ψ_{d}(X) with a polynomial kernelgram matrix and by using the Schatten normminimization algorithm [3]. The details are presented in [15].
Another approach to the highrank matrix completion problem was proposed in [19–21]. The matrix ψ_{d}(X) is rank deficient when each column vector x_{i} is given by a polynomial mapping of latent features \( \boldsymbol {y}_{i} \in {\mathbb {R}^{r}} (r \ll M < N)\) denoted by:
with polynomial coefficients \( \boldsymbol {U}_{p}\in {\mathbb {R}^{{M}\times {\binom {r+p}{p}}}} \) and order p≪M, because R=rank(ψ_{d}(X)) satisfies:
and \( R < \binom {M+d}{d}\) if r,p≪M<N. Therefore, the matrix ψ_{d}(X) can be approximated by a lowrank matrix. [19–21] proposed a highrank matrixcompletion algorithm using matrix factorization in the same way as [15]; however, this algorithm requires a large number of observed entries and does not recover the matrix when only a small number are present. The algorithm restores [ψ_{d}(x_{1})⋯ψ_{d}(x_{N})] uniquely if the sample number N and the sampling rate \( q=\frac {\Omega }{MN} \) satisfy the inequality:
For example, when \( p=3, r=5, m=100, d=2, \binom {r+pd}{pd} = 462 \) and \( \binom {M+d}{d} = 5151 \), although the ratio \( \binom {r+pd}{pd}/\binom {M+d}{d} \ll \binom {r+p}{p}/M = 0.56 \), we need N≥5982 for q=0.4 and N≥1362465 for q=0.3. Hence, we expect that the matrixcompletion accuracy will worsen when p and r are high and N is small.
Therefore, this paper proposes a new approach that makes use of neither monomial features nor the kernel method, but which is rather based on the assumption that an LDDM can be approximated locally as an LDLS to achieve high completion accuracy with a small q and too few samples N.
Methods
Local lowdimensional model
First, in order to consider how the LDDM is structured when the matrix X is given, this paper assumes that some columns x_{j} are approximated by a vector within a set of tangent vectors x+U(x) at x in the LDDM \( \mathcal {M}_{r} \). Here, U(x) is defined as:
ε>0 denotes the radius of an rdimensional hyperball, and J(x) denotes a Jacobian matrix defined as:
\( \boldsymbol {\phi }_{\lambda } : {\mathcal {U}}_{\lambda } \mapsto \mathcal {U}^{\prime }_{\lambda } \) and \( \boldsymbol {\phi }_{\lambda }^{1} = \left [\phi ^{1}_{\lambda,1} \ \cdots \ \phi ^{1}_{\lambda,M}\right ]^{T} : \mathcal {U}^{\prime }_{\lambda } \mapsto {\mathcal {U}}_{\lambda } \) denote a chart and its inverse for an index λ, with an open set \( \mathcal {U}_{\lambda } \) that includes x satisfying \( \bigcup _{\lambda } \mathcal {U}_{\lambda } = \mathcal {M}_{r} \) and \( \mathcal {U}^{\prime }_{\lambda } \subset {\mathbb {R}^{r}}\). Then, we consider that each x_{j} in a set {x_{1},x_{2},⋯,x_{N}} can be approximated by a vector belonging to \(\bigcup _{i\neq j}(\boldsymbol {x}_{i}+U(\boldsymbol {x}_{i}))\) for all \(j\in \mathcal {I}\). In other words, we assume that we have the following nonemptyindex set \( \mathcal {I}_{i} \) for \( i\in \mathcal {I} \) defined as:
where η>0 denotes the upper bound of the Euclidean distance between x_{j}−x_{i} and a vector z_{i,j}∈U(x_{i}). In this case, the rank of a matrix \( \boldsymbol {Z}_{i} =\left [\boldsymbol {z}_{i,j_{1}}\ \boldsymbol {z}_{i,j_{2}} \ \cdots \boldsymbol {z}_{i,j_{\mathcal {I}_{i}}}\right ] \) (where \( \left \{j_{1},j_{2},\cdots,j_{\mathcal {I}_{i}}\right \}= I_{i}\)) is less than or equal to r because of rank(J(x_{i}))=r.
Figure 1 illustrates the construction of each z_{i,j}. From the figure, it is apparent that the z_{i,j}∈U(x_{i}) can be obtained for suitable parameters ε and η. Therefore, the matrixcompletion problem for an arbitrary LDDM (1) can be substituted with the problem of finding sets \( \mathcal {I}_{i}, \boldsymbol {Z}_{i} \) that satisfy (6) and the missing entries of the matrix X with the set of tangent vectors x_{i}+U(x_{i}) and given parameters ε and η.
Next, we consider how to find z_{i,j} and \(\mathcal {I}_{i}\). To simplify the below explanation, we redefine a variable z_{i,j} as follows:
where e_{i,j} denotes an error vector satisfying \( \\boldsymbol {e}_{i,j}\_{2}^{2}\leq \eta \) and d_{i,j}∈{0,1} denotes a variable for which finding d_{i,j} is equivalent to finding \(\mathcal {I}_{i}\). In order to find a suitable solution for d_{i,j}, this paper formulates the following maximization problem:
where \( \boldsymbol {Z}_{i} \in {\mathbb {R}^{{M}\times {N}}}\) is a matrix whose jth column vector is z_{i,j}. Since the problem (8) cannot be solved because of U(x_{i}) when the LDDM \( \mathcal {M}_{r} \) is unknown (as is often the case in actual problems), this paper reformulates the constraint condition z_{i,j}∈U(x_{i}) as two constraint conditions : (1) rank(Z_{i})≤r, because the span of U(x_{i}) is an rdimensional linear subspace, and (2) \( \\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}\leq \epsilon _{i} \) if d_{i,j}=1. Because it is difficult to estimate the radius of an ellipsoid since each J(x_{i}) is arbitrary and unknown, this paper uses the Euclidean distance of x_{j}−x_{i} and gives the radius of the hyperball ε_{i} for each x_{i}. Thus, this paper reformulates the problem (8) with the given parameters r and \( \{\epsilon _{i}\}_{i\in \mathcal {I}} \) as:
where the 2nd constraint condition is the same as \( \\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}\leq \epsilon _{i} \) if d_{i,j}=1. Thus, we obtain the formulation for finding z_{i,j} on each U(x_{i}) and the set \( \mathcal {I}_{i} \) without understanding J(x). However, it is difficult to solve the problem (9) for \( \boldsymbol {X},\{\boldsymbol {Z}_{i}\}_{i\in \mathcal {I}}, \{d_{i,j}\}_{(i,j)\in \mathcal {I}^{2}} \) at the same time due to the condition rank(Z_{i})≤r. Actual applications may not be able to find a suitable dimension r. In order to solve this issue, the present paper explains how to obtain the solution using a MRM technique in Section 3.2.
Local lowrank approximation algorithm
First, we consider how to estimate the dimension of the LDDM, r, with an arbitrary matrix X. We can estimate r simply using a principalcomponent analysis if we obtain d_{i,j}; however, the lower the rank of the matrix Z_{i}, the lower that the number of solutions to d_{i,j}=1 becomes for the solution of (9). It can be seen that there is a tradeoff between the dimension r and the number of solutions to d_{i,j}=1. Therefore, this paper formulates the following problem:
where 0≤α≤1 denotes a given tradeoff parameter, which is the ratio of the decreasing rank of Z_{i} to the sum of d_{i,j}. Because solving the problem (10) is NPhard due to rank(Z_{i}), this paper reformulates the problem as one of relaxation:
where f is defined as:
Here, β,γ≥0 denote the given parameters, function ∥·∥_{∗} denotes the nuclear norm, \(\boldsymbol {1}_{N}\in {\mathbb {R}^{N}}\) denotes the vector whose elements are all 1, D_{i} denotes a diagonal matrix whose diagonal elements (D_{i})_{j,j} each equal d_{i,j}, and trace(Y) denotes the sum of all diagonal elements of Y.
Next, this paper presents a technique to solve the problem (11) using alternating optimization. Firstly, we consider how to solve the problem (11) for Z_{i} and d_{i,j} with a fixed X. We repeat the following schemes until a termination condition is satisfied with respect to Z_{i} and d_{i,j}:
where \( h_{\beta,\epsilon _{i}}(\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {z}_{i,j}) \) is defined as:
〈a,b〉 denotes the inner product of a and b,sat(c)= max(0, min(1,c)), and \(\boldsymbol {\mathcal {T}}_{\tau }\) denotes the matrixshrinkage operator for the nuclear normminimization problem [1]. Each step of (13) minimizes the objective function (12) for Z_{i} and d_{i,j}. Then, we consider minimizing the objective function (12) for X with fixed Z_{i} and d_{i,j}. Since the objective function (12) is quadratic for vec(X)=x, we obtain the following solution to the quadratically constrained quadratic program for a given Z_{i} and d_{i,j}:
where \( \boldsymbol {I}_{M,M} \in {\mathbb {R}^{{M}\times {M}}} \) is the identity matrix, ⊗ is the Kronecker product of two matrices, and \( \boldsymbol {L} \in {\mathbb {R}^{{N}\times {N}}}\) is a graph Laplacian defined as:
\( \boldsymbol {c}=\left [\boldsymbol {c}_{1}^{T} \ \boldsymbol {c}_{2}^{T} \ \cdots \ \boldsymbol {c}_{M}^{T}\right ]^{T}\in {\mathbb {R}^{MN}} \) is defined as:
for l=1,⋯,M. ⊙ denotes the Hadamard product. Thus, we can alternately optimize for each of Z_{i},d_{i,j} and X in the problem (11).
Truncated nuclear normminimization approach
The solution to the problem (11) is obtained by minimizing the function (12). However, the norm of the solution X might be below the true value, since nuclear norm minimization decreases not only the (r+1)th biggest singlular values, but also the 1st to the rth biggest singular values. Therefore, this paper reformulates the problem and the evaluation function as follows:
where r∈{0,1,⋯,M} is a given parameter and the function ∥Z∥_{∗,r} represents the truncated nuclear norm, which is defined with the kth biggest singular value σ_{k} of Z. The details of the truncated nuclear norm and the optimization technique are given in Appendix. Note that the truncated nuclear norm with r=0 is equal to the nuclear norm. In this case, the problem (11) is same as the problem (16). When the variables X and D_{i} are constant, the optimal solution for each Z_{i} is obtained by \( \boldsymbol {Z}_{i}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{(\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T})\boldsymbol {D}_{i}\right \}. \) In the same way, to solve the problem (11), this paper describes Algorithm 1 using iterative partial matrix shrinkage (IPMS) [5] for the problem (16), which contains the algorithm for (11). Here, \( \boldsymbol {0}_{M,N} \in {\mathbb {R}^{{M}\times {N}}}\) denotes a zero matrix, η_{1},η_{2} denote lower limits for the termination conditions \( \\boldsymbol {D}^{\text {old}}_{i}\boldsymbol {D}_{i}\_{F}/\\boldsymbol {D}_{i}\_{F}\leq \eta _{1} \), and ∥X^{old}−X∥_{F}/∥X∥_{F}≤η_{2} and β^{(k)},γ^{(k)},r^{(k)} denote given parameters that satisfy 0<β^{(0)}≤β^{(1)}≤⋯≤β^{max},γ_{0}≥γ_{1}≥⋯≥γ_{min}≥0,0≤r_{0}≤r_{1}≤⋯≤r_{max}≤M.
Convergence analysis
This section presents the convergence property of Algorithm 1.
First, let us define the following schemes with regard to the tth iteration of the second iteration statements in Algorithm 1:
u(t_{1},t_{2}) behaves as:
for t,t_{1},t_{2}≥0 and a given \( \boldsymbol {X} \in {\mathbb {R}^{{M}\times {N}}}\).
Lemma 1
For t≥0 and a given \(\boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}}\), the \( d_{i,j}^{(t)} \) generated by the update schemes (18) satisfies:
Proof
u(t_{1},t_{2}) satisfies u(t,t)≥u(t+1,t)≥u(t+1,t+1)≥⋯≥−βN^{2} for t≥0, since \(d_{i,j}^{(t)}= h_{\beta,\epsilon _{i}}\left (\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {z}_{i,j}^{(t)}\right)\) is the closedform optimal solution of the convex quadraticminimization problem with linear constraints for fixed \(\boldsymbol {Z}_{i}^{(t)}\), and \(\boldsymbol {Z}_{i}^{(t+1)}\) represents the optimal solution for fixed \(d_{i,j}^{(t)}\) (from Theorem 1 of [5]). Each \( d_{i,j}^{(t+1)} \) satisfies the following KKT condition of problem (16) with \(\left (\\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}\epsilon _{i}\right){d_{i,j}^{(t+1)}}\leq 0,{d_{i,j}^{(t)}}1\leq 0,{d_{i,j}^{(t+1)}}\leq 0\) for \( (i,j)\in \mathcal {I}^{2}\):
where \(\mu _{1,i,j}^{(t+1)},\mu _{2,i,j}^{(t+1)}\) and \(\mu _{3,i,j}^{(t+1)}\) denote KKT multipliers for \( d_{i,j}^{(t+1)} \). Therefore, u(t_{1},t_{2}) satisfies:
Since \(\left (\\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}\epsilon _{i}\right){d_{i,j}^{(t+1)}}=0\) if \(\mu _{1,i,j}^{(t+1)}>0\), \( {d_{i,j}^{(t+1)}}1 =0 \) if \( \mu _{2,i,j}^{(t+1)}>0 \) and \( {d_{i,j}^{(t+1)}} = 0 \) if \( \mu _{3,i,j}^{(t+1)}>0 \), and each \( d_{i,j}^{(t)} \) satisfies the constraint condition:
Therefore, each sequence \( \{d_{i,j}^{(t)}\} \) converges to a limit point \(\bar {d}_{i,j}\) if u(0,0)<∞, because \({d_{i,j}^{(t)}}{d_{i,j}^{(t+1)}} \rightarrow 0\) when t→∞ if \( \\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}>0 \) and \( {d_{i,j}^{(t)}}{d_{i,j}^{(t+1)}} = 11 = 0 \), even if \(\\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}=0\) for \(t\geq 0, (i,j)\in \mathcal {I}^{2}\). Then, each sequence \( \{\boldsymbol {Z}_{i}^{(t)}\} \) converges to a limit point \(\bar {\boldsymbol {Z}}_{i}\) because each \( \boldsymbol {Z}_{i}^{(t+1)} \) can be obtained by the softthresholding operator using fixed \( d_{i,j}^{(t)} \) for \(t\geq 0, i\in \mathcal {I}\). □
Lemma 2
If β≥ε_{i}, the optimal solution of (17) under the constraint conditions for d_{i,j} and Z_{i} can be obtained by initializing \(\boldsymbol {Z}_{i}^{(0)}\) as \(\boldsymbol {Z}_{i}^{(0)}=\boldsymbol {0}_{M,N}\) and updating \(d_{i,j}^{(0)}\) and \(\boldsymbol {Z}_{i}^{(1)}\) using the update schemes (18) for a given \(\boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}}\).
Proof
From Theorem 1 of [5], any \( \boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}} \) and each optimal solution \(\bar {\boldsymbol {Z}}_{i} \) and \( \bar {\boldsymbol {D}}_{i} \) satisfies \(\bar {\boldsymbol {Z}}_{i}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{\left (\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\bar {\boldsymbol {D}}_{i}\right \} \). For a given d_{i,j}≥0, a matrix \(\boldsymbol {Z}_{i}=\mathcal {T}_{r,\gamma }\left \{\left (\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right){\boldsymbol {D}}_{i}\right \}\) satisfies 0≤〈x_{j}−x_{i},z_{i,j}〉 because, when d_{i,j}>0,
Here, y_{i,j} denotes the jth column of \(\boldsymbol {Y}_{i}=\left (\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} =\boldsymbol {U}\text {diag}(\boldsymbol {\sigma })\boldsymbol {V}^{T}\) and σ,U,V denotes the singular values and vectors of \(\left (\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i}\); when d_{i,j}=0,〈y_{i,j},z_{i,j}〉=0 because of y_{i,j}=0_{M}, where \( \boldsymbol {0}_{M}\in {\mathbb {R}^{M}} \) denotes the zero vector. Then, \(\bar {d}_{i,j}\) satisfies:
because \(\beta \geq \epsilon _{i}\geq \\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}\), which does not depend on \(\bar {\boldsymbol {Z}}_{i}\). Therefore, \(d_{i,j}^{(0)}=h_{\beta,\epsilon _{i}}(\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {0}_{M})\in \{0,1\}\) and \(\boldsymbol {Z}_{i}^{(1)}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}){D}_{i}^{(0)}\right \}\) is the optimal solution for (16). □
Next, let us define the following schemes with regard to the kth iteration of the firstiteration statements in Algorithm 1 with β^{(k)},γ^{(k)},r^{(k)} for k≥0,
where \( \bar {d}_{i,j} \) and \( \bar {\boldsymbol {Z}}_{i} \) are the tth elements of the sequences obtained by the schemes (18) with β^{(k)},γ^{(k)},r^{(k)},X^{(k)}, and vector c^{(k)} as:
Here, \( \tilde {\boldsymbol {D}}^{(k)}\in {\mathbb {R}^{{N}\times {N}}}\) and \(\tilde {\boldsymbol {Z}}_{l}^{(k)} \in {\mathbb {R}^{{N}\times {N}}}\) denote matrices defined as \((\tilde {\boldsymbol {D}})_{i,j}^{(k)}= d_{i,j}^{(k)}\) and \((\tilde {\boldsymbol {Z}}_{l}^{(k)})_{i,j}= (\boldsymbol {Z}_{i}^{(k)})_{l,j}\) for \( (i,j)\in \mathcal {I}^{2}\), and the graph Laplacian L^{(k)} is:
where \(\hat {\boldsymbol {D}}^{(k)}\in {\mathbb {R}^{{N}\times {N}}}\) denotes a matrix whose every element is given by \(\left (\hat {\boldsymbol {D}}^{(k)}\right)_{i,j}={d_{i,j}^{(k)}}^{2}+{d_{j,i}^{(k)}}^{2}\).
Lemma 3
For k≥0,L^{(k)} satisfies kernel(L^{(k)})⊇kernel(L^{(k+1)}).
Proof
Since a vector a∈kernel(L^{(k)}) satisfies:
kernel(L^{(k)}) is written as:
Since \(d_{i,j}^{(k+1)}\) and \(d_{i,j}^{(k)}\) generated by the schemes (18) and (19) satisfy \( d_{i,j}^{(k+1)}>0 \) when \( d_{i,j}^{(k)}>0 \), L^{(k)} satisfies kernel(L^{(k)})⊇kernel(L^{(k+1)}). □
Now, let us describe the properties of the sequences generated by Algorithm 1 \(\left \{\boldsymbol {X}^{(k)}\right \}, \left \{\boldsymbol {Z}_{i}^{(k)}\right \}, \left \{d_{i,j}^{(k)}\right \}\). We define the evaluation function:
and replace the linearconstraint condition (X^{(k)})_{m,n}=(X^{(0)})_{m,n} for (m,n)∈Ω with Ax^{(k)}=b, where \( \boldsymbol {b}\in {\mathbb {R}^{\Omega }} \) denotes a vector whose elements are observed values {(X^{(0)})_{m,n}}_{(m,n)∈Ω} and A∈{0,1}^{Ω×MN} denotes a selector matrix.
Theorem 1
The sequences \(\left \{\boldsymbol {X}^{(k)}\right \}, \left \{\boldsymbol {Z}_{i}^{(k)}\right \}\) and \(\left \{d_{i,j}^{(k)}\right \}\) converge to the limit points \( \bar {\boldsymbol {X}}, \bar {\boldsymbol {Z}}_{i}\), and \( \bar {d}_{i,j} \) under repetition of the iteration schemes of (19) when \(\text {kernel}\left (\tilde {\boldsymbol {L}}^{(0)}\right) \cap v(\boldsymbol {A}) = \{\boldsymbol {0}_{MN}\} \), where \(\tilde {\boldsymbol {L}}^{(k)}=\boldsymbol {L}^{(k)}\otimes \boldsymbol {I}_{M,M}\).
Proof
The scheme (15) can be written as:
where \(\boldsymbol {Q}_{i,j}\in {\mathbb {R}^{{M}\times {MN}}}\) denotes a matrix defined as \(\boldsymbol {Q}_{i,j}=\boldsymbol {q}_{i,j}^{T}\otimes \boldsymbol {I}_{M,M}\) and \(\boldsymbol {q}_{i,j}\in {\mathbb {R}^{N}}\) is defined such that the ith element is 1, the jth element is −1, and the others are 0 (Q_{i,j} satisfies \( \\boldsymbol {Q}_{i,j}\boldsymbol {x}\_{2}^{2}=\\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2} \) for \( \boldsymbol {x}\in {\mathbb {R}^{MN}} \)). Since x^{(k+1)} satisfies the following KKT condition for v(k,k+1,k,k):
where λ^{(k+1)} and \(\mu _{i,j}^{(k+1)}\) denote the KKT multipliers, v(k_{1},k_{2},k_{3},k_{4}) satisfies:
where the second equality uses the fact that Ax^{(k)}=b. Since \(\\boldsymbol {Q}_{i,j}\boldsymbol {x}^{(k+1)}\_{2}^{2}=\epsilon _{i}\) when \(\mu _{i,j}^{(k+1)}>0\),
The second inequality uses:
Obviously, v(k,k+1,k,k)≥v(k+1,k+1,k,k) because the parameters {β^{(k)},γ^{(k)},r^{(k)}} decrease the objective function (17), and v(k+1,k+1,k,k)≥v(k+1,k+1,k+1,k+1) from Lemma 1. Since the sequence {v(k,k,k,k)} generated by (19) converges to a limit point because of:
x^{(k)}−x^{(k+1)}→0_{MN} when k→∞ and v(0,0,0,0)<∞ because each L^{(k)} satisfies \(\text {kernel}\left (\tilde {\boldsymbol {L}}^{(k)}\right) \cap \text {kernel}(\boldsymbol {A}) = \{\boldsymbol {0}_{mn}\} \) for k≥0 if \(\text {kernel}\left (\tilde {\boldsymbol {L}}^{(0)}\right) \cap \text {kernel}(\boldsymbol {A}) = \{\boldsymbol {0}_{MN}\} \) from Lemma 3. X^{(k)} reaches a limit point \(\bar {\boldsymbol {X}}\); then, the sequence \(\left \{\boldsymbol {Z}_{i}^{(k)}\right \}\) and \(\left \{d_{i,j}^{(k)}\right \}\) converges to limit points \(\bar {\boldsymbol {Z}}_{i}\) and \(\bar {d}_{i,j}\) with a fixed \(\bar {\boldsymbol {X}}\) from Lemma 1. □
Finally, some improvements to Algorithm (1) are offered in this section. First, the dimension of the LDDM is unknown in actual applications, although Algorithm 1 requires a suitable r. In order to solve this issue, we adopt a method that estimates the dimension r based on the ratio of the singular value σ_{r}/σ_{1}, just as [5] did for each column \( i\in \mathcal {I} \). Second, we consider ways to reduce the computational complexity. Two key possibilities are considered: one is to ignore the quadraticconstraint condition \( \left (\\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}\epsilon _{i}\right)d_{i,j}\leq 0 \) when we update X and the other is to update X for only the columns in the ith neighborhoods, for example, by minimizing the only ith Frobenius norm term of (17) \( \left \(\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T})\boldsymbol {D}_{i}\boldsymbol {Z}_{i}\right \_{F}^{2} \) with regard to the column x_{i}, which is expected to work like a stochastic gradientdescent algorithm. Furthermore, this paper utilizes the parameter β= maxiε^{(i)} because the update schemes (18) yield limit points for Z_{i} and d_{i,j} only once for each \( i\in \mathcal {I} \) from Lemma 2. Thus, this paper proposes a heuristic algorithm for reducing the calculation time, as shown in Algorithm 2. There, the parameters satisfy 1>α^{(0)}≥α^{(1)}≥⋯≥α_{min}>0 for k=0,1,⋯,k_{max} and δ>0, just as in [5].
We consider here the time and space complexities of Algorithm 2. The major computational cost of Algorithm 2 is derived from computing the singular value decomposition of \(\left (\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} \) for all i=1,2,⋯,N at each iteration. For simplicity, this paper assumes that the number of nonzero vectors of \(\left (\boldsymbol {X}\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} \) is M for each iteration and each i. Then, since the algorithm requires the singular value decomposition of the M×M matrix, the time and space complexities of Algorithm 2 are O(M^{3}N) and O(M^{2}) for each iteration. As written in [20], since the method VMC [15] requires the time complexity O(N^{3}+MN^{2}) and the space complexity O(N^{2}), the time and space complexities of Algorithm 2 are lower than those of VMC when the numbers of rows M and columns N satisfy M^{3}<N^{2}. Hence, Algorithm 2 is effective for datasets such as those used in Section 5.2.
Results and discussion
Synthetic data
This section presents several numerical examples for the matrix completion problem (1). In this section, each ith column of X^{(0)} is generated by \( \boldsymbol {\mathcal {F}}_{p}: {\mathbb {R}^{r}} \mapsto {\mathbb {R}^{M}} \) with mapping function (3) as:
Using \( \boldsymbol {U}_{p}\in {\mathbb {R}^{{M}\times {\binom {r+p}{p}}}} \) and \( \boldsymbol {Y}^{(0)}=\left [\boldsymbol {y}_{1}^{(0)} \ \boldsymbol {y}_{2}^{(0)}\ \cdots \ \boldsymbol {y}_{N}^{(0)}\right ] \in {\mathbb {R}^{{r}\times {N}}}\) generated by an i.i.d. continuous uniform distribution whose supports are [−0.5,0.5] and [−1,1], the elements of Y^{(0)} are normalized as max(Y^{(0)})_{i,j}=1. The index set Ω is generated using the Bernoulli distribution with the given probability q, for which an index (i,j) belongs to Ω. This paper uses relative recovery error as:
to evaluate each algorithm. All numerical experiments were run in MATLAB 2017b on a PC with an Intel Core i7 3.1 GHz CPU, 8 GB of RAM, and no swap memory.
This paper applies some lowrank matrix completion algorithms including singular value thresholding (SVT) [1], the fixedpoint continuation algorithm (FPCA) [2], the short IRLS0 (sIRLS0) method [3], IPMS [5], the nonlinear matrix completion method VMC [15], and the proposed LLRASGD method to several matrix completion problems with M=100,N=4,000, and d=3,5 for (20). A maximum iteration number of k_{max}=1000 is used for LLRA, IPMS, sIRLS0, and SVT, and the termination condition is ∥X^{(k)}−X^{(k+1)}∥_{F}/∥X^{(k+1)}∥_{F}≤10^{−5} for all algorithms. The parameters for LLRASGD and IPMS are given as \( \alpha ^{(k)}=10^{\frac {4k}{k_{\text {max}}}} \) and δ=10^{−2}; those for SVT are \( \tau ^{(k)}=10^{2}\sigma _{1}^{(k)} \); those for sIRLS0 and VMC are \( \gamma ^{(k)}=10^{2\frac {6k}{k_{\text {max}}}} \); those for VMC are p=0.5 and d=3; and those for FPCA are τ=1 and \( \mu ^{(k)}=(0.25)^{k} \geq \bar {\mu }=10^{8} \). The condition \( \sigma _{l}^{(k)}\geq 10^{2}\sigma _{1}^{(k)} \) is used to choose r for FPCA in this paper. We set the initial value of {(X)_{m,n}}_{(m,n)∉Ω} to 0 for SVT, FPCA, sIRLS0, and IPMS. The values X and ε_{i} are estimated using IPMS for VMC and LLRASGD such that the total number satisfying the condition \( \\boldsymbol {x}_{j}\boldsymbol {x}_{i}\_{2}^{2}\leq \epsilon _{i} \) equals 50 with an estimated value of X using IPMS for LLRASGD.
The results are shown in Tables 1, 2, and 3 for q∈{0.2,0.3,0.4} and r∈{2,3,4,5,6}. As can be seen, estimation accuracy of LLRASGD is better than the others for r=5,6,q=0.2,0.3,0.4 and d=3,5,7, and r=3,4,5,6 and q=0.2 especially. Figures 2, 3, and 4 compare all algorithms with q=0.3. In Figs. 2 and 3, the recovery errors of LLRASGD tend not to decay more than other algorithms. From this result, the proposed method is more effective for the case in which the missing rate or the latent dimension is high.
CMU motion capture data
This paper considers the matrix completion on motion capture data, which consists of timeseries trajectories of human motions such as running and jumping. Similar to [15], this paper uses the trial #6 of subject #56 of the CMU motion capture dataset. The data has measurements from M=62 sensors at 6784 time instants, which the data matrix is known as highrank matrix. In this experiment, the sequence is downsampled by factor 2, which the data matrix has M=62 rows and N=3392 columns. Then, the elements of the data matrix were randomly observed with the ratio q∈{0.1,0.2,0.3,0.4}, and this paper applied the matrix completion algorithms with the same parameters which is used in the Section 5.1.
The average recovery errors for 10 trials are shown in Fig. 5. Similar to the results on synthetic data, the estimation accuracy of LLRASGD is better than the others. Especially, the recovery errors of LLRASGD are much lower than the others when the missing ratio is very high (such as q=0.1,0.2). From these results, the proposed method is more effective for not only synthetic data but also realworld dataset.
The average computational time costs for all observed ratio q∈{0.1,0.2,0.3,0.4} are shown in Table 4. This result indicates that the computation time of LLRASGD is about 200 to 500 times longer than that of the conventional MRM methods, and the computation time of VMC is about 2.4 times longer than that of LLRASGD for the same number of iterations. However, VMC and LLRASGD have sufficiently high estimation accuracy even with a small number of iterations. Figure 6 shows the results of VMC and LLRASGD for the maximum iteration number of k_{max}∈{10,20,40,100,1000} in the observed ratio q=0.2. As can be seen in Fig. 6, the recovery error of LLRASGD converges sufficiently in k_{max}=40. In this maximum iteration number, although the computational time of LLRASGD is about 16 times longer than that of IPMS, the recovery error of LLRASGD is less than half that of IPMS. We can also see that the recovery error of LLRASGD is less than that of VMC for all k_{max}∈{10,20,40,100,1000}.
Conclusion
This paper proposed a local lowrank approach (LLRA) for a matrixcompletion problem in which the columns of the matrix belong to an LDDM. The convergence properties of this approach were also presented. The proposed method is based on the idea of tangent hyperplanes of dimension equal to that of the LDDM with respect to each column of the matrix. It is assumed that each hyperplane is of low dimension and that the sum of the rank of each local submatrix with respect to each column belonging to the set of nearest neighborhoods of each column is minimized. Numerical examples show that the proposed algorithm offers higher accuracy for matrix completion than other algorithms in the case where each column vector is given by a pth order polynomial mapping of a latent feature. In particular, the proposed method is suitable when the order p and the dimension of the latent space are high.
Appendix
In this section, this paper introduces the truncated nuclear norm and the minimization technique by IPMS [5].
The truncated nuclear norm ∥Z∥_{∗,r} is defined with the kth biggest singular value σ_{k} of Z as:
The truncated nuclear norm is used as the substitution function of matrix rank, and the solution of \(\frac {1}{2}\\boldsymbol {Y}\boldsymbol {Z}\_{F}^{2} +\gamma \\boldsymbol {Z}\_{*,r} \) with a given matrix Y and a given parameter γ>0 can be solved as follows:
where \(\boldsymbol {\mathcal {T}}_{r,\gamma }\) denotes the matrixshrinkage operator defined as:
and (c)_{+}= max(0,c) with regard to the singular value decomposition Y=Udiag(σ)V^{T}.
In the matrix completion problem (2), the IPMS algorithm solves the relaxation problem by iterating the following update schemes:
Since the truncated nuclear norm requires the value of r regarding with a matrix rank, the IPMS estimates a matrix rank r during iterations by using the scheme:
where 0≤α<1 is a given constant. The details of the IPMS algorithm are written in [5].
Availability of data and materials
Please contact the author for data requests.
Abbreviations
 LDDM:

Lowdimensional differentiable manifold
 MRM:

Matrix rank minimization
 LDLS:

Lowdimensional linear subspace
 UoLS:

Union of linear subspaces
 VMC:

Varietybased matrix completion
 LLRA:

Local lowrank approach
 SVT:

Singular value thresholding
 FPCA:

Fixedpoint continuation algorithm
 IRLS:

Iterative reweighted least squares
 IPMS:

Iterative partial matrix shrinkage
References
 1
J. F. Cai, E. J. Candés, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim.20(4), 1956–1982 (2010).
 2
D. Goldfarb, S. Ma, Convergence of fixed point continuation algorithms for matrix rank minimization. Found. Comput. Math. 11(2), 183–210 (2011).
 3
K. Mohan, M. Fazel, Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res. (JMLR). 13(1), 3441–3473 (2012).
 4
D. Zhang, Y. Hu, J. Ye, X. Li, X. He, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Matrix completion by truncated nuclear norm regularization, (2012), pp. 2192–2199.
 5
K. Konishi, K. Uruma, T. Takahashi, T. Furukawa, Iterative partial matrix shrinkage algorithm for matrix rank minimization. Signal Process. 100:, 124–131 (2014).
 6
J. Gotoh, A. Takeda, K. Tono, DC formulations and algorithms for sparse optimization Problems. J. Math. Program. 169(1), 141–176 (2018).
 7
X. Guan, C. T. Li, Y. Guan, Matrix factorization with rating completion: an enhanced SVD model for collaborative filtering recommender systems. IEEE Access. 5:, 27668–27678 (2017).
 8
M. Verhaegen, A. Hansson, N2SID: nuclear norm subspace identification of innovation models. Autom.72:, 57–63 (2016).
 9
K. H. Jin, J. C. Ye, Annihilating filterbased lowrank Hankel matrix approach for image inpainting. IEEE Trans. Image Process.24(11), 3498–3511 (2015).
 10
Q. Zhao, D. Meng, Z. Xu, W. Zuo, Y. Yan, L_{1}norm lowrank matrix factorization by variational Bayesian method. IEEE Trans. Neural Netw. Learn. Syst.26(4), 825–839 (2015).
 11
B. Eriksson, L. Balzano, R. Nowak, High rank matrix completion. Int. Conf. Artif. Intell. Stat.22:, 373–381 (2012).
 12
C. Yang, D. Robinson, R. Vidal, in Proc. of the 32th Int. Conf. on Machine Learning (PMLR), 37. Sparse subspace clustering with missing entries (PMLRLille, 2015), pp. 2463–2472.
 13
C. G. Li, R. Vidal, A structured sparse plus structured lowrank framework for subspace clustering and completion. IEEE Trans. Signal Process. 64(24), 6557–6570 (2016).
 14
E. Elhamifar, in Proc. 28th Adv. Neural Inf. Process. Syst. Highrank matrix completion and clustering under selfexpressive models (Curran Associates Inc.Barcelona, 2016), pp. 73–81.
 15
G. Ongie, R. Willett, R. D. Nowak, L. Balzano, in Proc. of the 34th Int. Conf. on Machine Learning (PMLR), 70. Algebraic variety models for highrank matrix completion (PMLRSydney, 2017), pp. 2691–2700.
 16
G. Ongie, L. Balzano, D. PimentelAlarcón, R. Willett, R. D. Nowak, Tensor methods for nonlinear matrix completion. arXiv preprint arXiv:1804.10266 (2018). https://arxiv.org/abs/1804.10266.
 17
R. Vidal, Y. Ma, S. Sastry, Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005).
 18
X. AlamedaPineda, E. Ricci, Y. Yan, N. Sebe, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Recognizing emotions from abstract paintings using nonlinear matrix completion (IEEELas Vegas, 2016), pp. 5240–5248.
 19
J. Fan, T. W. S. Chow, Nonlinear matrix completion. Pattern Recogn.77:, 378–394 (2018).
 20
J. Fan, M. Udell, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Online high rank matrix completion (IEEELong Beach, 2019), pp. 8682–8690.
 21
J. Fan, Y. Zhang, M. Udell, in Proc. AAAI, 34. Polynomial matrix completion for missing data imputation and transductive learning,” (AAAI PressNew York, 2020), pp. 3842–3849.
 22
J. Fan, J. Cheng, Matrix completion by deep matrix factorization. Neural Netw.98:, 34–41 (2018).
 23
B. Schölkopf, A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2002). https://books.google.co.jp/books?id=y8ORL3DWt4sC&hl=ja&source=gbs_book_other_versions.
 24
S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Sci.290(5500), 2323–2326 (2000).
 25
M. Winlaw, D. L. Samimi, A. Ghodsi, Robust locally linear embedding using penalty functions. Int. Joint Conf. Neural Netw., 2305–2312 (2011).
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions that helped improve the quality of this manuscript.
Funding
This work was supported by the JSPS KAKENHI Grant Number JP19H02163.
Author information
Affiliations
Contributions
All authors have contributed equally. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sasaki, R., Konishi, K., Takahashi, T. et al. Local lowrank approach to nonlinear matrix completion. EURASIP J. Adv. Signal Process. 2021, 11 (2021). https://doi.org/10.1186/s13634021007177
Received:
Accepted:
Published:
Keywords
 Matrix rank minimization
 Nonlinear matrix completion
 Differentiable manifold
 Dimensionarity reduction