Local low-rank approach to nonlinear matrix completion

Sasaki, Ryohei; Konishi, Katsumi; Takahashi, Tomohiro; Furukawa, Toshihiro

doi:10.1186/s13634-021-00717-7

Research
Open access
Published: 12 February 2021

Local low-rank approach to nonlinear matrix completion

Ryohei Sasaki ORCID: orcid.org/0000-0003-2327-7910¹,
Katsumi Konishi¹,
Tomohiro Takahashi² &
…
Toshihiro Furukawa³

EURASIP Journal on Advances in Signal Processing volume 2021, Article number: 11 (2021) Cite this article

1706 Accesses
1 Citations
Metrics details

Abstract

This paper deals with a problem of matrix completion in which each column vector of the matrix belongs to a low-dimensional differentiable manifold (LDDM), with the target matrix being high or full rank. To solve this problem, algorithms based on polynomial mapping and matrix-rank minimization (MRM) have been proposed; such methods assume that each column vector of the target matrix is generated as a vector in a low-dimensional linear subspace (LDLS) and mapped to a pth order polynomial and that the rank of a matrix whose column vectors are dth monomial features of target column vectors is deficient. However, a large number of columns and observed values are needed to strictly solve the MRM problem using this method when p is large; therefore, this paper proposes a new method for obtaining the solution by minimizing the rank of the submatrix without transforming the target matrix, so as to obtain high estimation accuracy even when the number of columns is small. This method is based on the assumption that an LDDM can be approximated locally as an LDLS to achieve high completion accuracy without transforming the target matrix. Numerical examples show that the proposed method has a higher accuracy than other low-rank approaches.

1 Introduction

This paper deals with the following completion problem for a matrix $ \boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}} $ on a low-dimensional differentiable manifold (LDDM) $\mathcal {M}_{r}$:

$$\begin{array}{*{20}l} \begin{array}{cc} \text{Find}&\boldsymbol{X}=[\boldsymbol{x}_{1} \ \boldsymbol{x}_{2} \ \cdots \ \boldsymbol{x}_{N}]\\ {{\text{subject to}}}& (\boldsymbol{X})_{m,n}=\left(\boldsymbol{X}^{(0)}\right)_{m,n} \ \text{for}\ (m,n)\in\Omega\\ &\boldsymbol{x}_{i} \in \mathcal{M}_{r} \ \text{for all} \ i\in\mathcal{I}, \end{array} \end{array} $$

(1)

where the (m,n)th element of a matrix is denoted by $(\cdot)_{m,n}, \mathcal {I} $ is an index set defined as $ \mathcal {I}=\{1,2,\cdots,N\} $, and $ \mathcal {M}_{r} \subset {\mathbb {R}^{M}}, \Omega $, and X⁽⁰⁾ denote an unknown r-dimensional differential manifold, a given index set, and a given observed matrix, respectively. In this paper, the LDDM $\mathcal {M}_{r}$ satisfies the following condition: on an open set $\mathcal {U}_{\lambda }$ satisfying $\bigcup _{\lambda } \mathcal {U}_{\lambda } = \mathcal {M}_{r}$, there exists a differentiable homeomorphism $ \boldsymbol {\phi }_{\lambda } : {\mathcal {U}}_{\lambda } \mapsto \mathcal {U}^{\prime }_{\lambda } $, where $\mathcal {U}^{\prime }_{\lambda }$ denotes an open set of ${\mathbb {R}^{r}}$. If $\mathcal {M}_{r}$ is an unknown low-dimensional linear subspace (LDLS), then this is a low-rank matrix completion problem. Many algorithms have been proposed [1–6] to obtain solutions to this problem with high estimation accuracy. The low-rank matrix completion problem has various applications in the field of signal processing, including collaborative filtering [7], low-order model fitting and system identification [8], image inpainting [9], and human-motion recovery [10], all of which are formulated as signal recovery or estimation problems. However, in most practical applications, the column vectors of a matrix do not belong to an LDLS, i.e., $\mathcal {M}_{r}$ is not an LDLS. Therefore, these algorithms do not achieve high performance. As an example, a matrix is of high rank when its column vectors lie on a union of linear subspaces (UoLS), which the column space of the matrix is high dimension even when the dimension of the linear subspace is low. In this case, several methods have been proposed to solve this high-rank matrix completion problem [11–16], all of which are based on subspace clustering [17]. In particular, [15] proposed an algebraic variety approach known as variety-based matrix completion (VMC), which is based on the fact that the monomial features of each column vector belong to an LDLS when the column vectors belong to a UoLS. This approach solves the rank minimization problem about the Gram matrix of the monomial features by relaxing the problem into one of rank minimization of a polynomial kernel matrix. Unfortunately, these algorithms recover a matrix only when ${\mathcal {M}}_{r}$ can be approximately divided into some LDLSs and do not work well otherwise.

To solve the matrix completion problem on a general LDDM, some nonlinear methods have been proposed [18–22]. In particular, Fan et al. [19–21] have proposed a method based on a kind of kernel principal component analysis [23] that assumes that the dimension of the subspace spanned by the column vectors mapped nonlinearly is low. They formulate the matrix completion problem as a low-rank approximation problem of the kernel matrix, in common with [15]; however, they require a large number of observed entries in the matrix to solve the problem, and the matrix completion accuracy declines when the number of observed entries is small.

In the present paper, a new method is proposed that uses neither the monomial features nor the kernel method to achieve high completion accuracy. Based on an idea similar to that of locally linear embedding [24, 25], this paper assumes that an LDDM can be approximated locally as a LDLS, because there are tangent hyperplanes whose dimension is equal to that of the manifold. The matrix completion problem is then formulated as one of minimizing the rank of the local submatrix of X whose columns are local nearest neighborhoods of each other.

This paper is organized as follows. In Section 2, related works are introduced. Section 3 proposes a local low-rank approach (LLRA) to solve a matrix completion problem on an LDDM, and the convergence properties of the proposed algorithm are shown in Section 4. Finally, numerical examples are presented in Section 5 to illustrate that the proposed algorithm has a higher accuracy than other low-rank approaches.

2 Related works

Here, we focus on some matrix completion algorithms based on matrix rank minimization (MRM) on an unknown manifold, $\mathcal {M}_{r}$. First, this paper introduces the algorithms for the case where $ \mathcal {M}_{r} $ is an r-dimensional linear subspace in Section 2.1; then, Section 2.2 shows the algorithms using the polynomial kernel for a UoLS and an LDDM.

2.1 Matrix rank minimization for linear subspace

Most algorithms for matrix completion deal with the case where the manifold $ \mathcal {M}_{r} $ is an LDLS [1–3, 5]. In this case, since the dimension of r is unknown, they formulate a matrix completion problem as the following MRM problem to simultaneously estimate r and to restore X.

$$\begin{array}{*{20}l} \begin{array}{cc} \underset{\boldsymbol{X}}{\text{Minimize}}& \text{rank}(\boldsymbol{X})\\ {{\text{subject to}}}& (\boldsymbol{X})_{m,n}=\left(\boldsymbol{X}^{(0)}\right)_{m,n} \ \text{for}\ (m,n)\in\Omega. \end{array} \end{array} $$

(2)

Since this problem is generally NP-hard, several surrogate functions such as the nuclear norm [1] and truncated nuclear norm [5], Schatten norm [3] have been proposed. These algorithms recover X well if X can be approximated as a low-rank matrix.

2.2 High-rank matrix completion with the kernel method

To recover a high-rank matrix with columns belonging to an UoLS or an LDDM, some algorithms have been proposed that minimize the rank of its kernel matrix [15, 18–21].

In [15], the authors focused on a matrix completion problem on an union of d linear subspaces $\bigcup _{k=1}^{d} \mathcal {S}_{k} $, where $ \mathcal {S}_{k} $ denotes an LDLS of dimension r or lower. Since the matrix rank is high or full in this problem, the MRM approach does not achieve high performance. To solve this matrix completion problem, an algebraic variety model approach was proposed based on the fact that the monomial features of each column vector ($ \boldsymbol {x}_{i} \in \bigcup _{k=1}^{d} \mathcal {S}_{k} $) belong to a LDLS.

Here, the monomial features of x are defined as:

$$\begin{array}{*{20}l} \boldsymbol{\psi}_{d}(\boldsymbol{x}) =(\boldsymbol{x}^{\boldsymbol{\alpha}})_{|\boldsymbol{\alpha}|\leq d} \in {\mathbb{R}^{\binom{M+d}{d}}}, \end{array} $$

(3)

α=[α₁ ⋯ α_M] denotes a multi-index of non-negative integers, x^α is defined as $ \boldsymbol {x}^{\boldsymbol {\alpha }}=x_{1}^{\alpha _{1}}\cdots x_{M}^{\alpha _{M}}, |\boldsymbol {\alpha }| = \alpha _{1}+\cdots +\alpha _{M} $.

Since $ \boldsymbol {x}\in \bigcup _{k=1}^{d} \mathcal {S}_{k} $ if and only if $ \prod _{k=1}^{d}(\boldsymbol {x}^{T} \boldsymbol {a}_{k}) =0 $ (where a_k denotes a vector in the orthogonal complement of $ \mathcal {S}_{k} $), there exists a vector $ \boldsymbol {c}\in {\mathbb {R}^{\binom {M+d}{d}}}$ that satisfies c^Tψ_d(x)=0. Hence, the matrix ψ_d(X)=[ψ_d(x₁) ψ_d(x₂) ⋯ ψ_d(x_N)] is rank deficient, and the high-rank matrix completion problem is formulated as follows:

$$\begin{array}{*{20}l} \begin{array}{cc} \underset{\boldsymbol{X}}{\text{Minimize}}& \text{rank}\left(\boldsymbol{\psi}_{d}(\boldsymbol{X})\right)\\ {{\text{subject to}}}& (\boldsymbol{X})_{m,n}=\left(\boldsymbol{X}^{(0)}\right)_{m,n} \ \text{for}\ (m,n)\in\Omega. \end{array} \end{array} $$

(4)

This problem can be solved efficiently by replacing ψ_d(X) with a polynomial kernel-gram matrix and by using the Schatten norm-minimization algorithm [3]. The details are presented in [15].

Another approach to the high-rank matrix completion problem was proposed in [19–21]. The matrix ψ_d(X) is rank deficient when each column vector x_i is given by a polynomial mapping of latent features $ \boldsymbol {y}_{i} \in {\mathbb {R}^{r}} (r \ll M < N)$ denoted by:

$$\boldsymbol{x}_{i} = \boldsymbol{U}_{p}\boldsymbol{\psi}_{p}(\boldsymbol{y}_{i}) $$

with polynomial coefficients $ \boldsymbol {U}_{p}\in {\mathbb {R}^{{M}\times {\binom {r+p}{p}}}} $ and order p≪M, because R=rank(ψ_d(X)) satisfies:

$$R = \min\left\{N,\binom{M+d}{d},\binom{r+pd}{pd}\right\} $$

and $ R < \binom {M+d}{d}$ if r,p≪M<N. Therefore, the matrix ψ_d(X) can be approximated by a low-rank matrix. [19–21] proposed a high-rank matrix-completion algorithm using matrix factorization in the same way as [15]; however, this algorithm requires a large number of observed entries and does not recover the matrix when only a small number are present. The algorithm restores [ψ_d(x₁)⋯ψ_d(x_N)] uniquely if the sample number N and the sampling rate $ q=\frac {|\Omega |}{MN} $ satisfy the inequality:

$$q \geq \left(\frac{R}{N}+\frac{R}{\binom{M+d}{d}}- \frac{R^{2}}{N\binom{M+d}{d}} \right)^{\frac{1}{d}}. $$

For example, when $ p=3, r=5, m=100, d=2, \binom {r+pd}{pd} = 462 $ and $ \binom {M+d}{d} = 5151 $, although the ratio $ \binom {r+pd}{pd}/\binom {M+d}{d} \ll \binom {r+p}{p}/M = 0.56 $, we need N≥5982 for q=0.4 and N≥1362465 for q=0.3. Hence, we expect that the matrix-completion accuracy will worsen when p and r are high and N is small.

Therefore, this paper proposes a new approach that makes use of neither monomial features nor the kernel method, but which is rather based on the assumption that an LDDM can be approximated locally as an LDLS to achieve high completion accuracy with a small q and too few samples N.

3 Methods

3.1 Local low-dimensional model

First, in order to consider how the LDDM is structured when the matrix X is given, this paper assumes that some columns x_j are approximated by a vector within a set of tangent vectors x+U(x) at x in the LDDM $ \mathcal {M}_{r} $. Here, U(x) is defined as:

$$\begin{array}{*{20}l} U(\boldsymbol{x})=\left\{ J(\boldsymbol{x})\Delta\boldsymbol{y}\in {\mathbb{R}^{M}} \ \mid \ \Delta\boldsymbol{y}\in{\mathbb{R}^{r}}, \|\Delta\boldsymbol{y}\|_{2}^{2}\leq \epsilon \right\}. \end{array} $$

(5)

ε>0 denotes the radius of an r-dimensional hyperball, and J(x) denotes a Jacobian matrix defined as:

$$\boldsymbol{J}(\boldsymbol{x})= \left.\left[ \begin{array}{llll} \frac{\partial \phi_{\lambda,1}^{-1}}{\partial y_{1}}&\frac{\partial \phi_{\lambda,1}^{-1}}{\partial y_{2}}&\cdots &\frac{\partial \phi_{\lambda,1}^{-1}}{\partial y_{r}}\\ \frac{\partial \phi_{\lambda,2}^{-1}}{\partial y_{1}}&\frac{\partial \phi_{\lambda,2}^{-1}}{\partial y_{2}}&\cdots &\frac{\partial \phi_{\lambda,2}^{-1}}{\partial y_{r}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial \phi_{\lambda,M}^{-1}}{\partial y_{1}}&\frac{\partial \phi_{\lambda,M}^{-1}}{\partial y_{2}}&\cdots &\frac{\partial \phi_{\lambda,M}^{-1}}{\partial y_{r}} \end{array} \right]\right|_{\boldsymbol{y}=\boldsymbol{\phi}_{\lambda}(\boldsymbol{x})}. $$

$ \boldsymbol {\phi }_{\lambda } : {\mathcal {U}}_{\lambda } \mapsto \mathcal {U}^{\prime }_{\lambda } $ and $ \boldsymbol {\phi }_{\lambda }^{-1} = \left [\phi ^{-1}_{\lambda,1} \ \cdots \ \phi ^{-1}_{\lambda,M}\right ]^{T} : \mathcal {U}^{\prime }_{\lambda } \mapsto {\mathcal {U}}_{\lambda } $ denote a chart and its inverse for an index λ, with an open set $ \mathcal {U}_{\lambda } $ that includes x satisfying $ \bigcup _{\lambda } \mathcal {U}_{\lambda } = \mathcal {M}_{r} $ and $ \mathcal {U}^{\prime }_{\lambda } \subset {\mathbb {R}^{r}}$. Then, we consider that each x_j in a set {x₁,x₂,⋯,x_N} can be approximated by a vector belonging to $\bigcup _{i\neq j}(\boldsymbol {x}_{i}+U(\boldsymbol {x}_{i}))$ for all $j\in \mathcal {I}$. In other words, we assume that we have the following non-empty-index set $ \mathcal {I}_{i} $ for $ i\in \mathcal {I} $ defined as:

$$\begin{array}{*{20}l} \mathcal{I}_{i} \,=\, \left\{j\!\in\!\mathcal{I} \ \mid \ \|\boldsymbol{x}_{j}\,-\,\boldsymbol{x}_{i}\,-\,\boldsymbol{z}_{i,j}\|_{2}^{2}\leq \eta, \boldsymbol{z}_{i,j}\!\!\in\! U(\boldsymbol{x}_{i})\right\}, \end{array} $$

(6)

where η>0 denotes the upper bound of the Euclidean distance between x_j−x_i and a vector z_i,j∈U(x_i). In this case, the rank of a matrix $ \boldsymbol {Z}_{i} =\left [\boldsymbol {z}_{i,j_{1}}\ \boldsymbol {z}_{i,j_{2}} \ \cdots \boldsymbol {z}_{i,j_{|\mathcal {I}_{i}|}}\right ] $ (where $ \left \{j_{1},j_{2},\cdots,j_{|\mathcal {I}_{i}|}\right \}= I_{i}$) is less than or equal to r because of rank(J(x_i))=r.

Figure 1 illustrates the construction of each z_i,j. From the figure, it is apparent that the z_i,j∈U(x_i) can be obtained for suitable parameters ε and η. Therefore, the matrix-completion problem for an arbitrary LDDM (1) can be substituted with the problem of finding sets $ \mathcal {I}_{i}, \boldsymbol {Z}_{i} $ that satisfy (6) and the missing entries of the matrix X with the set of tangent vectors x_i+U(x_i) and given parameters ε and η.

Next, we consider how to find z_i,j and $\mathcal {I}_{i}$. To simplify the below explanation, we redefine a variable z_i,j as follows:

$$\begin{array}{*{20}l} \boldsymbol{z}_{i,j}=(\boldsymbol{x}_{j}-\boldsymbol{x}_{i})d_{i,j}+\boldsymbol{e}_{i,j}, \end{array} $$

(7)

where e_i,j denotes an error vector satisfying $ \|\boldsymbol {e}_{i,j}\|_{2}^{2}\leq \eta $ and d_i,j∈{0,1} denotes a variable for which finding d_i,j is equivalent to finding $\mathcal {I}_{i}$. In order to find a suitable solution for d_i,j, this paper formulates the following maximization problem:

$$\begin{array}{*{20}l} \begin{array}{cc} \underset{\substack{ \boldsymbol{X},\{\boldsymbol{Z}_{i}\}_{i\in\mathcal{I}},\\ \{d_{i,j}\}_{(i,j)\in\mathcal{I}^{2}}}}{\text{Maximize}} & \sum_{i=1}^{N}\sum_{j=1}^{N} d_{i,j}\\ {{\text{subject to}}} & \boldsymbol{z}_{i,j}\in U(\boldsymbol{x}_{i})\\ & \|(\boldsymbol{x}_{j}-\boldsymbol{x}_{i})d_{i,j}-\boldsymbol{z}_{i,j}\|_{2}^{2}\leq \eta \\ & d_{i,j}\in\{0,1\} \ \text{for} \ (i,j) \in \mathcal{I}^{2}\\ & (\boldsymbol{X})_{m,n}=\left(\boldsymbol{X}^{(0)}\right)_{m,n} \ \text{for}\ (m,n)\in\Omega, \end{array} \end{array} $$

(8)

where $ \boldsymbol {Z}_{i} \in {\mathbb {R}^{{M}\times {N}}}$ is a matrix whose jth column vector is z_i,j. Since the problem (8) cannot be solved because of U(x_i) when the LDDM $ \mathcal {M}_{r} $ is unknown (as is often the case in actual problems), this paper reformulates the constraint condition z_i,j∈U(x_i) as two constraint conditions : (1) rank(Z_i)≤r, because the span of U(x_i) is an r-dimensional linear subspace, and (2) $ \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}\leq \epsilon _{i} $ if d_i,j=1. Because it is difficult to estimate the radius of an ellipsoid since each J(x_i) is arbitrary and unknown, this paper uses the Euclidean distance of x_j−x_i and gives the radius of the hyperball ε_i for each x_i. Thus, this paper reformulates the problem (8) with the given parameters r and $ \{\epsilon _{i}\}_{i\in \mathcal {I}} $ as:

$$\begin{array}{*{20}l} \begin{array}{cc} \underset{\substack{\boldsymbol{X},\{\boldsymbol{Z}_{i}\}_{i\in\mathcal{I}},\\ \{d_{i,j}\}_{(i,j)\in\mathcal{I}^{2}}}}{\text{Maximize}} & \sum_{i=1}^{N}\sum_{j=1}^{N} d_{i,j}\\ {{\text{subject to}}} & \text{rank}(\boldsymbol{Z}_{i}) \leq r \ \text{for}\ i\in \mathcal{I},\\ &(\|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2}- \epsilon_{i})d_{i,j}\leq 0 \\ & \|(\boldsymbol{x}_{j}-\boldsymbol{x}_{i})d_{i,j}-\boldsymbol{z}_{i,j}\|_{2}^{2}\leq \eta\\ & d_{i,j} \in \{0,1\} \ \text{for}\ (i,j)\in \mathcal{I}^{2}\\ & (\boldsymbol{X})_{m,n}=\left(\boldsymbol{X}^{(0)}\right)_{m,n} \ \text{for}\ (m,n)\in\Omega, \end{array} \end{array} $$

(9)

where the 2nd constraint condition is the same as $ \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}\leq \epsilon _{i} $ if d_i,j=1. Thus, we obtain the formulation for finding z_i,j on each U(x_i) and the set $ \mathcal {I}_{i} $ without understanding J(x). However, it is difficult to solve the problem (9) for $ \boldsymbol {X},\{\boldsymbol {Z}_{i}\}_{i\in \mathcal {I}}, \{d_{i,j}\}_{(i,j)\in \mathcal {I}^{2}} $ at the same time due to the condition rank(Z_i)≤r. Actual applications may not be able to find a suitable dimension r. In order to solve this issue, the present paper explains how to obtain the solution using a MRM technique in Section 3.2.

3.2 Local low-rank approximation algorithm

First, we consider how to estimate the dimension of the LDDM, r, with an arbitrary matrix X. We can estimate r simply using a principal-component analysis if we obtain d_i,j; however, the lower the rank of the matrix Z_i, the lower that the number of solutions to d_i,j=1 becomes for the solution of (9). It can be seen that there is a trade-off between the dimension r and the number of solutions to d_i,j=1. Therefore, this paper formulates the following problem:

$$\begin{array}{*{20}l} \begin{array}{cc} \underset{\substack{\boldsymbol{X},\{\boldsymbol{Z}_{i}\}_{i\in\mathcal{I}},\\ \{d_{i,j}\}_{(i,j)\in\mathcal{I}^{2}}}}{\text{Minimize}}& \sum_{i=1}^{N}\left\{\alpha\text{rank}(\boldsymbol{Z}_{i})-(1-\alpha)\sum_{j=1}^{N} d_{i,j}\right\}\\ {{\text{subject to}}} & \|(\boldsymbol{x}_{j}-\boldsymbol{x}_{i})d_{i,j}-\boldsymbol{z}_{i,j}\|_{2}^{2}\leq \eta, \\ &(\|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2} - \epsilon_{i})d_{i,j}\leq 0 \\ & d_{i,j} \in \{0,1\} \ \text{for}\ (i,j)\in \mathcal{I}^{2}\\ & (\boldsymbol{X})_{m,n}=(\boldsymbol{X}^{(0)})_{m,n} \ \text{for}\ (m,n)\in\Omega, \end{array} \end{array} $$

(10)

where 0≤α≤1 denotes a given trade-off parameter, which is the ratio of the decreasing rank of Z_i to the sum of d_i,j. Because solving the problem (10) is NP-hard due to rank(Z_i), this paper reformulates the problem as one of relaxation:

$$\begin{array}{*{20}l} \begin{array}{cc} \underset{\substack{\boldsymbol{X},\{\boldsymbol{Z}_{i}\}_{i\in\mathcal{I}},\\ \{d_{i,j}\}_{(i,j)\in\mathcal{I}^{2}}}}{\text{Minimize}}& f_{\beta,\gamma}\left(\boldsymbol{X},\left\{\boldsymbol{Z}_{i}\right\}_{i\in \mathcal{I}}, \left\{d_{i,j}\right\}_{(i,j)\in \mathcal{I}^{2}}\right)\\ {{\text{subject to}}} &\left(\|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2} - \epsilon_{i}\right)d_{i,j}\leq 0 \\ & d_{i,j}\in [0,1] \ \text{for}\ (i,j)\in \mathcal{I}^{2} \\ & (\boldsymbol{X})_{m,n}=\left(\boldsymbol{X}^{(0)}\right)_{m,n} \ \text{for}\ (m,n)\in\Omega, \end{array} \end{array} $$

(11)

where f is defined as:

$$\begin{array}{*{20}l} &f_{\beta,\gamma}\left(\boldsymbol{X},\left\{\boldsymbol{Z}_{i}\right\}_{i\in \mathcal{I}}, \left\{d_{i,j}\right\}_{(i,j)\in \mathcal{I}^{2}}\right)\\ &=\sum_{i=1}^{N}\left\{ \begin{array}{l} \gamma\|\boldsymbol{Z}_{i}\|_*+\frac{1}{2}\left\|\left(\boldsymbol{X}-\boldsymbol{x}_{i}\boldsymbol{1}_{N}^{T}\right)\boldsymbol{D}_i-\boldsymbol{Z}_{i}\right\|_{F}^{2}\\ -\beta\text{trace}(\boldsymbol{D}_i) \end{array} \right\}. \end{array} $$

(12)

Here, β,γ≥0 denote the given parameters, function ∥·∥_∗ denotes the nuclear norm, $\boldsymbol {1}_{N}\in {\mathbb {R}^{N}}$ denotes the vector whose elements are all 1, D_i denotes a diagonal matrix whose diagonal elements (D_i)_j,j each equal d_i,j, and trace(Y) denotes the sum of all diagonal elements of Y.

Next, this paper presents a technique to solve the problem (11) using alternating optimization. Firstly, we consider how to solve the problem (11) for Z_i and d_i,j with a fixed X. We repeat the following schemes until a termination condition is satisfied with respect to Z_i and d_i,j:

$$\begin{array}{*{20}l} \begin{array}{lllll} \mathbf{1.}&d_{i,j}&\leftarrow& h_{\beta,\epsilon_{i}}(\boldsymbol{x}_{j},\boldsymbol{x}_{i},\boldsymbol{z}_{i,j}) & \text{for} \ (i,j)\in\mathcal{I}^{2},\\ \mathbf{2.}&\boldsymbol{Z}_{i}&\leftarrow&\boldsymbol{\mathcal{T}}_{\gamma}\left\{\left(\boldsymbol{X}-\boldsymbol{x}_{i}\boldsymbol{1}_{N}^{T}\right)\boldsymbol{D}_{i}\right\} &\text{for} \ i\in\mathcal{I}, \end{array} \end{array} $$

(13)

where $ h_{\beta,\epsilon _{i}}(\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {z}_{i,j}) $ is defined as:

$$\begin{array}{*{20}l} &h_{\beta,\epsilon_{i}}(\boldsymbol{x}_j,\boldsymbol{x}_i,\boldsymbol{z}_{i,j})\\ &=\left\{ \begin{array}{cl} 0&: \|\boldsymbol{x}_{j}\,-\,\boldsymbol{x}_{i}\|_{2}^2> \epsilon_i \\ 1&: \|\boldsymbol{x}_{j}\,-\,\boldsymbol{x}_{i}\|_{2}^2= 0 \\ \text{sat}\left(\frac{\langle\boldsymbol{x}_j-\boldsymbol{x}_i,\boldsymbol{z}_{i,j}\rangle+\beta}{\|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^{2}}\right)&: \text{otherwise} \end{array} \right. \end{array} $$

(14)

〈a,b〉 denotes the inner product of a and b,sat(c)= max(0, min(1,c)), and $\boldsymbol {\mathcal {T}}_{\tau }$ denotes the matrix-shrinkage operator for the nuclear norm-minimization problem [1]. Each step of (13) minimizes the objective function (12) for Z_i and d_i,j. Then, we consider minimizing the objective function (12) for X with fixed Z_i and d_i,j. Since the objective function (12) is quadratic for vec(X)=x, we obtain the following solution to the quadratically constrained quadratic program for a given Z_i and d_i,j:

$$\begin{array}{*{20}l} \begin{array}{cc} \underset{\boldsymbol{x}}{\text{argmin}}& \boldsymbol{x}^{T} (\boldsymbol{L}\otimes \boldsymbol{I}_{M,M}) \boldsymbol{x}-2\boldsymbol{x}^{T} \boldsymbol{c}\\ {{\text{subject to}}} & (\boldsymbol{X})_{m,n}=\left(\boldsymbol{X}^{(0)}\right)_{m,n} \ \text{for}\ (m,n)\in\Omega\\ &\|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2} \leq \epsilon_{i} \ \text{if}\ d_{i,j}>0, \end{array} \end{array} $$

(15)

where $ \boldsymbol {I}_{M,M} \in {\mathbb {R}^{{M}\times {M}}} $ is the identity matrix, ⊗ is the Kronecker product of two matrices, and $ \boldsymbol {L} \in {\mathbb {R}^{{N}\times {N}}}$ is a graph Laplacian defined as:

$$\begin{array}{*{20}l} \boldsymbol{L}&=\text{diag}\left(\hat{\boldsymbol{D}}\boldsymbol{1}_{N}\right)-\hat{\boldsymbol{D}},&\\ (\hat{\boldsymbol{D}})_{i,j}&=d_{i,j}^2+d_{j,i}^{2}\ \text{for}\ (i,j)\in\mathcal{I}^2.&\end{array} $$

$ \boldsymbol {c}=\left [\boldsymbol {c}_{1}^{T} \ \boldsymbol {c}_{2}^{T} \ \cdots \ \boldsymbol {c}_{M}^{T}\right ]^{T}\in {\mathbb {R}^{MN}} $ is defined as:

$$\begin{array}{*{20}l} \boldsymbol{c}_l&=\left(\tilde{\boldsymbol{D}}\odot \tilde{\boldsymbol{Z}}_l-(\tilde{\boldsymbol{D}}\odot \tilde{\boldsymbol{Z}}_l)^{T}\right)\boldsymbol{1}_N,&\\ (\tilde{\boldsymbol{Z}}_l)_{i,j}&= (\boldsymbol{Z}_{i})_{l,j} \ \text{for}\ (i,j)\in \mathcal{I}^{2},&\\ (\tilde{\boldsymbol{D}})_{i,j}&= d_{i,j} \ \text{for}\ (i,j)\in\mathcal{I}^{2},&\end{array} $$

for l=1,⋯,M. ⊙ denotes the Hadamard product. Thus, we can alternately optimize for each of Z_i,d_i,j and X in the problem (11).

3.3 Truncated nuclear norm-minimization approach

The solution to the problem (11) is obtained by minimizing the function (12). However, the norm of the solution X might be below the true value, since nuclear norm minimization decreases not only the (r+1)th biggest singlular values, but also the 1st to the rth biggest singular values. Therefore, this paper reformulates the problem and the evaluation function as follows:

$$\begin{array}{*{20}l} \begin{array}{cc} \!\!\!\!\underset{\substack{\boldsymbol{X},\{\boldsymbol{Z}_{i}\}_{i\in \mathcal{I}},\\ \{d_{i,j}\}_{(i,j)\in \mathcal{I}^{2}}}}{\text{Minimize}} & g_{\beta,\gamma,r}\left(\boldsymbol{X},\{\boldsymbol{Z}_{i}\}_{i\in \mathcal{I}}, \{d_{i,j}\}_{(i,j)\in \mathcal{I}^{2}}\right)\\ {{\text{subject to}}} &(\|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2} - \epsilon_{i})d_{i,j}\leq 0 \\ & d_{i,j}\in [0,1] \ \text{for}\ (i,j)\in \mathcal{I}^{2} \\ & (\boldsymbol{X})_{m,n}\,=\,(\boldsymbol{X}^{(0)})_{m,n} \ \text{for}\ (m,n)\!\in\!\Omega, \end{array} \end{array} $$

(16)

$$\begin{array}{*{20}l} &\!\!\!g_{\beta,\gamma,r}\left(\boldsymbol{X},\{\boldsymbol{Z}_{i}\}_{i\in \mathcal{I}}, \{d_{i,j}\}_{(i,j)\in \mathcal{I}^{2}}\right)\\ &\!\!\!=\sum_{i=1}^{N}\left\{\!\! \begin{array}{l} \gamma\|\boldsymbol{Z}_{i}\|_{*,r}+\frac{1}{2}\left\|(\boldsymbol{X}-\boldsymbol{x}_{i}\boldsymbol{1}_{N}^T)\boldsymbol{D}_i-\boldsymbol{Z}_{i}\right\|_{F}^{2}\\ -\beta\text{trace}(\boldsymbol{D}_i) \end{array} \!\!\right\}, \end{array} $$

(17)

where r∈{0,1,⋯,M} is a given parameter and the function ∥Z∥_∗,r represents the truncated nuclear norm, which is defined with the kth biggest singular value σ_k of Z. The details of the truncated nuclear norm and the optimization technique are given in Appendix. Note that the truncated nuclear norm with r=0 is equal to the nuclear norm. In this case, the problem (11) is same as the problem (16). When the variables X and D_i are constant, the optimal solution for each Z_i is obtained by $ \boldsymbol {Z}_{i}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{(\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T})\boldsymbol {D}_{i}\right \}. $ In the same way, to solve the problem (11), this paper describes Algorithm 1 using iterative partial matrix shrinkage (IPMS) [5] for the problem (16), which contains the algorithm for (11). Here, $ \boldsymbol {0}_{M,N} \in {\mathbb {R}^{{M}\times {N}}}$ denotes a zero matrix, η₁,η₂ denote lower limits for the termination conditions $ \|\boldsymbol {D}^{\text {old}}_{i}-\boldsymbol {D}_{i}\|_{F}/\|\boldsymbol {D}_{i}\|_{F}\leq \eta _{1} $, and ∥X^old−X∥_F/∥X∥_F≤η₂ and β^(k),γ^(k),r^(k) denote given parameters that satisfy 0<β⁽⁰⁾≤β⁽¹⁾≤⋯≤β^max,γ₀≥γ₁≥⋯≥γ_min≥0,0≤r₀≤r₁≤⋯≤r_max≤M.

4 Convergence analysis

This section presents the convergence property of Algorithm 1.

First, let us define the following schemes with regard to the tth iteration of the second iteration statements in Algorithm 1:

$$\begin{array}{*{20}l} \left\{ \begin{array}{lllll} d_{i,j}^{(t)}&\!\,=\,\!& h_{\beta,\epsilon_{i}}(\boldsymbol{x}_{j},\boldsymbol{x}_{i},\boldsymbol{z}_{i,j}^{(t)}) & \text{for} \ (i,j)\in\mathcal{I}^{2},\\ \boldsymbol{Z}_{i}^{(t+1)}&\!\,=\,\!&\boldsymbol{\mathcal{T}}_{r,\gamma}\left\{(\boldsymbol{X}-\boldsymbol{x}_{i}\boldsymbol{1}_{N}^{T})\boldsymbol{D}_{i}^{(t)}\right\} &\text{for} \ i\in\mathcal{I}, \end{array} \right. \end{array} $$

(18)

u(t₁,t₂) behaves as:

$$u(t_{1},t_{2})=g_{\beta,\gamma,r}\left(\boldsymbol{X},\left\{\boldsymbol{Z}_{i}^{(t_{1})}\right\}_{i\in \mathcal{I}}, \left\{d_{i,j}^{(t_{2})}\right\}_{(i,j)\in \mathcal{I}^{2}}\right),$$

for t,t₁,t₂≥0 and a given $ \boldsymbol {X} \in {\mathbb {R}^{{M}\times {N}}}$.

Lemma 1

For t≥0 and a given $\boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}}$, the $ d_{i,j}^{(t)} $ generated by the update schemes (18) satisfies:

$$\begin{array}{*{20}l} u(t,t)- u(t+1,t+1) \geq \ \ \ \\ \ \ \ \frac{1}{2}\sum_{(i,j)\in\mathcal{I}^{2}} \|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2}\left({d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}}\right)^{2}. \end{array} $$

Proof

u(t₁,t₂) satisfies u(t,t)≥u(t+1,t)≥u(t+1,t+1)≥⋯≥−βN² for t≥0, since $d_{i,j}^{(t)}= h_{\beta,\epsilon _{i}}\left (\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {z}_{i,j}^{(t)}\right)$ is the closed-form optimal solution of the convex quadratic-minimization problem with linear constraints for fixed $\boldsymbol {Z}_{i}^{(t)}$, and $\boldsymbol {Z}_{i}^{(t+1)}$ represents the optimal solution for fixed $d_{i,j}^{(t)}$ (from Theorem 1 of [5]). Each $ d_{i,j}^{(t+1)} $ satisfies the following KKT condition of problem (16) with $\left (\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}-\epsilon _{i}\right){d_{i,j}^{(t+1)}}\leq 0,{d_{i,j}^{(t)}}-1\leq 0,-{d_{i,j}^{(t+1)}}\leq 0$ for $ (i,j)\in \mathcal {I}^{2}$:

$$\begin{array}{*{20}l} \left\{ \begin{array}{l} \|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2}d_{i,j}^{(t+1)}-\langle \boldsymbol{x}_{j}-\boldsymbol{x}_{i}, \boldsymbol{z}_{i,j}^{(t+1)}\rangle -\beta\\ =\mu_{1,i,j}^{(t+1)}\left(\|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2}-\epsilon_{i}\right)+\mu_{2,i,j}^{(t+1)}-\mu_{3,i,j}^{(t+1)},\\ \mu_{1,i,j}^{(t+1)}\left(\|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2}-\epsilon_{i}\right)d_{i,j}^{(t+1)}=0,\\ \mu_{2,i,j}^{(t+1)}(d_{i,j}^{(t+1)}-1) =0,\\ \mu_{3,i,j}^{(t+1)}(-d_{i,j}^{(t+1)}) =0,\\ \mu_{1,i,j}^{(t+1)},\mu_{2,i,j}^{(t+1)},\mu_{3,i,j}^{(t+1)},\geq 0 \end{array} \right. \end{array} $$

where $\mu _{1,i,j}^{(t+1)},\mu _{2,i,j}^{(t+1)}$ and $\mu _{3,i,j}^{(t+1)}$ denote KKT multipliers for $ d_{i,j}^{(t+1)} $. Therefore, u(t₁,t₂) satisfies:

$$\begin{array}{*{20}l} &u(t+1,t)- u(t+1,t+1)\\ &=\!\sum_{(i,j)\in\mathcal{I}^{2}} \left\{ \begin{array}{l} \frac{1}{2}\|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^{2}\left({d_{i,j}^{(t)}}^2-{d_{i,j}^{(t+1)}}^{2}\right)\\ -\langle\boldsymbol{x}_j-\boldsymbol{x}_i,\boldsymbol{z}_{i,j}^{(t+1)}\rangle\left({d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}}\right)\\ -\beta \left({d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}}\right) \end{array} \right\}\\ &=\!\sum_{(i,j)\in\mathcal{I}^{2}} \frac{1}{2}\|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^{2}\left({d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}}\right)^{2}\\ &+\!\sum_{(i,j)\in\mathcal{I}^{2}} \begin{array}{l} \left(d_{i,j}^{(t)}-d_{i,j}^{(t+1)}\right)\left(\begin{array}{l} \|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^2d_{i,j}^{(t+1)}\\-\langle\boldsymbol{x}_j-\boldsymbol{x}_i,\boldsymbol{z}_{i,j}^{(t+1)}\rangle\\ -\beta \end{array} \right)\\ \end{array} \\ &=\!\sum_{(i,j)\in\mathcal{I}^{2}} \frac{1}{2}\|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^{2}\left({d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}}\right)^{2}\\ &-\!\sum_{(i,j)\in\mathcal{I}^{2}} \begin{array}{l} \left(d_{i,j}^{(t)}-d_{i,j}^{(t+1)}\right)\left\{ \begin{array}{l} \mu_{1,i,j}^{(t+1)}\left(\|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^2-\epsilon_{i}\right)\\ +\mu_{2,i,j}^{(t+1)}-\mu_{3,i,j}^{(t+1)} \end{array} \right\}\\ \end{array} \\ &=\!\sum_{(i,j)\in\mathcal{I}^{2}} \left\{ \begin{array}{l} \frac{1}{2}\|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^{2}\left({d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}}\right)^{2}\\ +\mu_{1,i,j}^{(t+1)}\left(\|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^2-\epsilon_{i}\right)\left({d_{i,j}^{(t+1)}}-{d_{i,j}^{(t)}}\right)\\ +\mu_{2,i,j}^{(t+1)}\left({d_{i,j}^{(t+1)}}-1-{d_{i,j}^{(t)}}+1\right)\\ +\mu_{3,i,j}^{(t+1)}\left(-{d_{i,j}^{(t+1)}}+{d_{i,j}^{(t)}}\right)\\ \end{array} \right\}, \end{array} $$

Since $\left (\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}-\epsilon _{i}\right){d_{i,j}^{(t+1)}}=0$ if $\mu _{1,i,j}^{(t+1)}>0$, $ {d_{i,j}^{(t+1)}}-1 =0 $ if $ \mu _{2,i,j}^{(t+1)}>0 $ and $ -{d_{i,j}^{(t+1)}} = 0 $ if $ \mu _{3,i,j}^{(t+1)}>0 $, and each $ d_{i,j}^{(t)} $ satisfies the constraint condition:

$$\begin{array}{*{20}l} u(t,t)- u(t+1,t+1) \geq\\ u(t+1,t)- u(t+1,t+1) \geq\\ \frac{1}{2}\sum_{(i,j)\in\mathcal{I}^{2}} \|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^{2}\left({d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}}\right)^2. \end{array} $$

Therefore, each sequence $ \{d_{i,j}^{(t)}\} $ converges to a limit point $\bar {d}_{i,j}$ if u(0,0)<∞, because ${d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}} \rightarrow 0$ when t→∞ if $ \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}>0 $ and $ {d_{i,j}^{(t)}}-{d_{i,j}^{(t+1)}} = 1-1 = 0 $, even if $\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}=0$ for $t\geq 0, (i,j)\in \mathcal {I}^{2}$. Then, each sequence $ \{\boldsymbol {Z}_{i}^{(t)}\} $ converges to a limit point $\bar {\boldsymbol {Z}}_{i}$ because each $ \boldsymbol {Z}_{i}^{(t+1)} $ can be obtained by the soft-thresholding operator using fixed $ d_{i,j}^{(t)} $ for $t\geq 0, i\in \mathcal {I}$. □

Lemma 2

If β≥ε_i, the optimal solution of (17) under the constraint conditions for d_i,j and Z_i can be obtained by initializing $\boldsymbol {Z}_{i}^{(0)}$ as $\boldsymbol {Z}_{i}^{(0)}=\boldsymbol {0}_{M,N}$ and updating $d_{i,j}^{(0)}$ and $\boldsymbol {Z}_{i}^{(1)}$ using the update schemes (18) for a given $\boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}}$.

Proof

From Theorem 1 of [5], any $ \boldsymbol {X}\in {\mathbb {R}^{{M}\times {N}}} $ and each optimal solution $\bar {\boldsymbol {Z}}_{i} $ and $ \bar {\boldsymbol {D}}_{i} $ satisfies $\bar {\boldsymbol {Z}}_{i}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\bar {\boldsymbol {D}}_{i}\right \} $. For a given d_i,j≥0, a matrix $\boldsymbol {Z}_{i}=\mathcal {T}_{r,\gamma }\left \{\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right){\boldsymbol {D}}_{i}\right \}$ satisfies 0≤〈x_j−x_i,z_i,j〉 because, when d_i,j>0,

$$\begin{array}{*{20}l} \langle \boldsymbol{x}_j-\boldsymbol{x}_i, \boldsymbol{z}_{i,j} \rangle d_{i,j} &= \langle \boldsymbol{y}_{i,j}, \boldsymbol{z}_{i,j} \rangle\\ &=\sum_{l=1}^{r}\sigma_{l}^{2}(V)_{j,l}^{2}\\ &\ \ +\sum_{l=r+1}^M \sigma_{l}(\sigma_l-\gamma)_+ (\boldsymbol{V})_{j,l}^{2}\\ &\geq 0. \end{array} $$

Here, y_i,j denotes the jth column of $\boldsymbol {Y}_{i}=\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} =\boldsymbol {U}\text {diag}(\boldsymbol {\sigma })\boldsymbol {V}^{T}$ and σ,U,V denotes the singular values and vectors of $\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i}$; when d_i,j=0,〈y_i,j,z_i,j〉=0 because of y_i,j=0_M, where $ \boldsymbol {0}_{M}\in {\mathbb {R}^{M}} $ denotes the zero vector. Then, $\bar {d}_{i,j}$ satisfies:

$$\begin{array}{*{20}l} \bar{d}_{i,j}=h_{\beta,\epsilon_{i}}(\boldsymbol{x}_j,\boldsymbol{x}_i,\bar{\boldsymbol{z}}_{i,j})=\left\{\begin{array}{cl}0 &:\|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^2>\epsilon_{i}\\1&: \|\boldsymbol{x}_j-\boldsymbol{x}_{i}\|_{2}^{2}\leq \epsilon_{i}\end{array}\right., \end{array} $$

because $\beta \geq \epsilon _{i}\geq \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}$, which does not depend on $\bar {\boldsymbol {Z}}_{i}$. Therefore, $d_{i,j}^{(0)}=h_{\beta,\epsilon _{i}}(\boldsymbol {x}_{j},\boldsymbol {x}_{i},\boldsymbol {0}_{M})\in \{0,1\}$ and $\boldsymbol {Z}_{i}^{(1)}=\boldsymbol {\mathcal {T}}_{r,\gamma }\left \{\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}){D}_{i}^{(0)}\right \}$ is the optimal solution for (16). □

Next, let us define the following schemes with regard to the kth iteration of the first-iteration statements in Algorithm 1 with β^(k),γ^(k),r^(k) for k≥0,

$$\begin{array}{*{20}l} \left\{ \begin{array}{llll} d_{i,j}^{(k)}&=& \bar{d}_{i,j} \ \text{for} \ (i,j)\in\mathcal{I}^{2},\\ \boldsymbol{Z}_{i}^{(k)}&=&\bar{\boldsymbol{Z}}_{i} \ \text{for} \ i\in\mathcal{I},\\ \boldsymbol{x}^{(k+1)}&=&\underset{\boldsymbol{x}}{\text{argmin}} \boldsymbol{x}^{T} (\boldsymbol{L}^{(k)}\otimes \boldsymbol{I}_{M,M}) \boldsymbol{x}-2\boldsymbol{x}^{T} \boldsymbol{c}^{(k)}\\ &&\text{s.t.} \ (\boldsymbol{X})_{m,n}=(\boldsymbol{X}^{(0)})_{m,n} \ \text{for}\ (m,n)\in\Omega\\ && \ \ \ \ \ \ \|\boldsymbol{x}_{j}-\boldsymbol{x}_{i}\|_{2}^{2} \leq \epsilon_{i} \ \text{if}\ d_{i,j}^{(k)}>0, \end{array} \right. \end{array} $$

(19)

where $ \bar {d}_{i,j} $ and $ \bar {\boldsymbol {Z}}_{i} $ are the tth elements of the sequences obtained by the schemes (18) with β^(k),γ^(k),r^(k),X^(k), and vector c^(k) as:

$$\begin{array}{*{20}l} \boldsymbol{c}^{(k)}&=\left[{\boldsymbol{c}_{1}^{(k)}}^T \ {\boldsymbol{c}_{2}^{(k)}}^{T}\ \cdots\ {\boldsymbol{c}_{M}^{(k)}}^{T}\right]^{T}\in {\mathbb{R}^{MN}}\\ \boldsymbol{c}_{l}^{(k)}&=\left(\tilde{\boldsymbol{D}}^{(k)}\odot \tilde{\boldsymbol{Z}}_{l}^{(k)}-\left(\tilde{\boldsymbol{D}}^{(k)}\odot \tilde{\boldsymbol{Z}}_{l}^{(k)}\right)^{T}\right)\boldsymbol{1}_N \\ &\in {\mathbb{R}^{N}} \ \text{for} \ l=1,2,\cdots, M. \end{array} $$

Here, $ \tilde {\boldsymbol {D}}^{(k)}\in {\mathbb {R}^{{N}\times {N}}}$ and $\tilde {\boldsymbol {Z}}_{l}^{(k)} \in {\mathbb {R}^{{N}\times {N}}}$ denote matrices defined as $(\tilde {\boldsymbol {D}})_{i,j}^{(k)}= d_{i,j}^{(k)}$ and $(\tilde {\boldsymbol {Z}}_{l}^{(k)})_{i,j}= (\boldsymbol {Z}_{i}^{(k)})_{l,j}$ for $ (i,j)\in \mathcal {I}^{2}$, and the graph Laplacian L^(k) is:

$$\begin{array}{*{20}l} \boldsymbol{L}^{(k)}=\text{diag}\left(\hat{\boldsymbol{D}}^{(k)}\boldsymbol{1}_{N}\right)-\hat{\boldsymbol{D}}^{(k)} \end{array} $$

where $\hat {\boldsymbol {D}}^{(k)}\in {\mathbb {R}^{{N}\times {N}}}$ denotes a matrix whose every element is given by $\left (\hat {\boldsymbol {D}}^{(k)}\right)_{i,j}={d_{i,j}^{(k)}}^{2}+{d_{j,i}^{(k)}}^{2}$.

Lemma 3

For k≥0,L^(k) satisfies kernel(L^(k))⊇kernel(L^(k+1)).

Proof

Since a vector a∈kernel(L^(k)) satisfies:

$$ \boldsymbol{a}^{T}\boldsymbol{L}^{(k)}\boldsymbol{a}= \sum_{(i,j)\in\mathcal{I}^{2}} \left({d_{i,j}^{(k)}}^{2}+{d_{j,i}^{(k)}}^{2} \right) (a_{i}-a_{j})^{2}=0, $$

kernel(L^(k)) is written as:

$$\begin{array}{*{20}l} &\text{kernel}\left(\boldsymbol{L}^{(k)}\right)\\ &=\left\{\boldsymbol{a}\in{\mathbb{R}^{N}} \ \mid\ a_i=a_j \ \text{for} \ (i,j) \ \text{s.t.} \ d_{i,j}^{(k)}+d_{j,i}^{(k)}>0\right\}. \end{array} $$

Since $d_{i,j}^{(k+1)}$ and $d_{i,j}^{(k)}$ generated by the schemes (18) and (19) satisfy $ d_{i,j}^{(k+1)}>0 $ when $ d_{i,j}^{(k)}>0 $, L^(k) satisfies kernel(L^(k))⊇kernel(L^(k+1)). □

Now, let us describe the properties of the sequences generated by Algorithm 1 $\left \{\boldsymbol {X}^{(k)}\right \}, \left \{\boldsymbol {Z}_{i}^{(k)}\right \}, \left \{d_{i,j}^{(k)}\right \}$. We define the evaluation function:

$$\begin{array}{*{20}l} &v(k_1,k_2,k_3,k_4)=\\&g_{\beta^{(k_1)},\gamma^{(k_1)},r^{(k_1)}}\left(\boldsymbol{X}^{(k_2)},\left\{\boldsymbol{Z}_{i}^{(k_3)}\right\}_{i\in \mathcal{I}}, \left\{d_{i,j}^{(k_4)}\right\}_{(i,j)\in \mathcal{I}^{2}}\right) \end{array} $$

and replace the linear-constraint condition (X^(k))_m,n=(X⁽⁰⁾)_m,n for (m,n)∈Ω with Ax^(k)=b, where $ \boldsymbol {b}\in {\mathbb {R}^{|\Omega |}} $ denotes a vector whose elements are observed values {(X⁽⁰⁾)_m,n}_(m,n)∈Ω and A∈{0,1}^|Ω|×MN denotes a selector matrix.

Theorem 1

The sequences $\left \{\boldsymbol {X}^{(k)}\right \}, \left \{\boldsymbol {Z}_{i}^{(k)}\right \}$ and $\left \{d_{i,j}^{(k)}\right \}$ converge to the limit points $ \bar {\boldsymbol {X}}, \bar {\boldsymbol {Z}}_{i}$, and $ \bar {d}_{i,j} $ under repetition of the iteration schemes of (19) when $\text {kernel}\left (\tilde {\boldsymbol {L}}^{(0)}\right) \cap v(\boldsymbol {A}) = \{\boldsymbol {0}_{MN}\} $, where $\tilde {\boldsymbol {L}}^{(k)}=\boldsymbol {L}^{(k)}\otimes \boldsymbol {I}_{M,M}$.

Proof

The scheme (15) can be written as:

$$\begin{array}{*{20}l} \begin{array}{cc} \underset{\boldsymbol{x}}{\text{argmin}}& \boldsymbol{x}^{T} \left(\boldsymbol{L}\otimes \boldsymbol{I}_{M,M}\right) \boldsymbol{x}-2\boldsymbol{x}^{T} \boldsymbol{c}\\ {{\text{subject to}}} & \boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}\\ &\|\boldsymbol{Q}_{i,j}\boldsymbol{x}\|_{2}^{2}-\epsilon_{i}\leq0 \ \text{if}\ d_{i,j}>0, \end{array} \end{array} $$

where $\boldsymbol {Q}_{i,j}\in {\mathbb {R}^{{M}\times {MN}}}$ denotes a matrix defined as $\boldsymbol {Q}_{i,j}=\boldsymbol {q}_{i,j}^{T}\otimes \boldsymbol {I}_{M,M}$ and $\boldsymbol {q}_{i,j}\in {\mathbb {R}^{N}}$ is defined such that the ith element is 1, the jth element is −1, and the others are 0 (Q_i,j satisfies $ \|\boldsymbol {Q}_{i,j}\boldsymbol {x}\|_{2}^{2}=\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2} $ for $ \boldsymbol {x}\in {\mathbb {R}^{MN}} $). Since x^(k+1) satisfies the following KKT condition for v(k,k+1,k,k):

$$\begin{array}{*{20}l} \left\{ \begin{array}{l} \tilde{\boldsymbol{L}}^{(k)}\boldsymbol{x}^{(k+1)}-\boldsymbol{c}^{(k)}+\lambda^{(k+1)} \boldsymbol{A}^{T}\boldsymbol{A}\boldsymbol{x}^{(k+1)}\\ +\sum_{d_{i,j}^{(k)}>0}\mu_{i,j}^{(k+1)}\boldsymbol{Q}_{i,j}^{T}\boldsymbol{Q}_{i,j}{\boldsymbol{x}^{(k+1)}}=\boldsymbol{0}_{MN},\\ \boldsymbol{A}\boldsymbol{x}^{(k+1)}=\boldsymbol{b},\\ \mu_{i,j}^{(k+1)}(\|\boldsymbol{Q}_{i,j}\boldsymbol{x}^{(k+1)}\|_{2}^{2}-\epsilon_{i})=0\ \text{for} \ d_{i,j}^{(k)} >0,\\ \mu_{i,j}^{(k+1)}\geq 0, \end{array} \right. \end{array} $$

where λ^(k+1) and $\mu _{i,j}^{(k+1)}$ denote the KKT multipliers, v(k₁,k₂,k₃,k₄) satisfies:

$$\begin{array}{*{20}l} &2v(k,k,k,k)-2v(k,k+1,k,k)\\ &\begin{array}{l} ={\boldsymbol{x}^{(k)}}^{T}\tilde{\boldsymbol{L}}^{(k)}{\boldsymbol{x}^{(k)}}-{\boldsymbol{x}^{(k+1)}}^{T}\tilde{\boldsymbol{L}}^{(k)}{\boldsymbol{x}^{(k+1)}}\\ \ \ -2{\boldsymbol{x}^{(k)}}^{T}\boldsymbol{c}^{(k)}+2{\boldsymbol{x}^{(k+1)}}^{T}\boldsymbol{c}^{(k)} \end{array}\\ &\begin{array}{l} =\left({\boldsymbol{x}^{(k)}}-{\boldsymbol{x}^{(k+1)}}\right)^{T}\tilde{\boldsymbol{L}}^{(k)}\left({\boldsymbol{x}^{(k)}}-{\boldsymbol{x}^{(k+1)}}\right)\\ \ \ -2{\boldsymbol{x}^{(k)}}^{T}\sum_{d_{i,j}^{(k)}>0}\mu_{i,j}^{(k+1)}\boldsymbol{Q}_{i,j}^{T}\boldsymbol{Q}_{i,j}{\boldsymbol{x}^{(k+1)}}\\ \ \ +2{\boldsymbol{x}^{(k+1)}}^{T}\sum_{d_{i,j}^{(k)}>0}\mu_{i,j}^{(k+1)}\boldsymbol{Q}_{i,j}^{T}\boldsymbol{Q}_{i,j}{\boldsymbol{x}^{(k+1)}}, \end{array}\\ \end{array} $$

where the second equality uses the fact that Ax^(k)=b. Since $\|\boldsymbol {Q}_{i,j}\boldsymbol {x}^{(k+1)}\|_{2}^{2}=\epsilon _{i}$ when $\mu _{i,j}^{(k+1)}>0$,

$$\begin{array}{*{20}l} &v(k,k,k,k)-v(k,k+1,k,k)\\ &\begin{array}{l} =\frac{1}{2}\left({\boldsymbol{x}^{(k)}}-{\boldsymbol{x}^{(k+1)}}\right)^{T}\tilde{\boldsymbol{L}}^{(k)}\left({\boldsymbol{x}^{(k)}}-{\boldsymbol{x}^{(k+1)}}\right)\\ \ \ +\sum_{d_{i,j}^{(k)}>0}\mu_{i,j}^{(k+1)}\left\{\epsilon_i-{{\boldsymbol{x}^{(k)}}^{T}\boldsymbol{Q}_{i,j}^{T}\boldsymbol{Q}_{i,j}{\boldsymbol{x}^{(k+1)}}}^{T}\right\}\\ \end{array}\\ &\begin{array}{l} \geq\frac{1}{2}\left({\boldsymbol{x}^{(k)}}-{\boldsymbol{x}^{(k+1)}}\right)^{T}\tilde{\boldsymbol{L}}^{(k)}\left({\boldsymbol{x}^{(k)}}-{\boldsymbol{x}^{(k+1)}}\right). \end{array} \end{array} $$

The second inequality uses:

$$\left|\langle \boldsymbol{Q}_{i,j}{\boldsymbol{x}^{(k)}},\boldsymbol{Q}_{i,j}{\boldsymbol{x}^{(k+1)}}\rangle\right| \!\leq\! \| \boldsymbol{Q}_{i,j}{\boldsymbol{x}^{(k)}}\|_{2}\| \boldsymbol{Q}_{i,j}{\boldsymbol{x}^{(k+1)}}\|_{2}\!\leq\! \epsilon_{i}.$$

Obviously, v(k,k+1,k,k)≥v(k+1,k+1,k,k) because the parameters {β^(k),γ^(k),r^(k)} decrease the objective function (17), and v(k+1,k+1,k,k)≥v(k+1,k+1,k+1,k+1) from Lemma 1. Since the sequence {v(k,k,k,k)} generated by (19) converges to a limit point because of:

$$\begin{array}{*{20}l} v(k,k,k,k)&\geq v(k,k+1,k,k)\\ &\geq v(k+1,k+1,k,k)\\ &\geq v(k+1,k+1,k+1,k+1)\\ &\ \vdots\\ &\geq -\beta_{\text{max}}N^{2}, \end{array} $$

x^(k)−x^(k+1)→0_MN when k→∞ and v(0,0,0,0)<∞ because each L^(k) satisfies $\text {kernel}\left (\tilde {\boldsymbol {L}}^{(k)}\right) \cap \text {kernel}(\boldsymbol {A}) = \{\boldsymbol {0}_{mn}\} $ for k≥0 if $\text {kernel}\left (\tilde {\boldsymbol {L}}^{(0)}\right) \cap \text {kernel}(\boldsymbol {A}) = \{\boldsymbol {0}_{MN}\} $ from Lemma 3. X^(k) reaches a limit point $\bar {\boldsymbol {X}}$; then, the sequence $\left \{\boldsymbol {Z}_{i}^{(k)}\right \}$ and $\left \{d_{i,j}^{(k)}\right \}$ converges to limit points $\bar {\boldsymbol {Z}}_{i}$ and $\bar {d}_{i,j}$ with a fixed $\bar {\boldsymbol {X}}$ from Lemma 1. □

Finally, some improvements to Algorithm (1) are offered in this section. First, the dimension of the LDDM is unknown in actual applications, although Algorithm 1 requires a suitable r. In order to solve this issue, we adopt a method that estimates the dimension r based on the ratio of the singular value σ_r/σ₁, just as [5] did for each column $ i\in \mathcal {I} $. Second, we consider ways to reduce the computational complexity. Two key possibilities are considered: one is to ignore the quadratic-constraint condition $ \left (\|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}-\epsilon _{i}\right)d_{i,j}\leq 0 $ when we update X and the other is to update X for only the columns in the ith neighborhoods, for example, by minimizing the only ith Frobenius norm term of (17) $ \left \|(\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T})\boldsymbol {D}_{i}-\boldsymbol {Z}_{i}\right \|_{F}^{2} $ with regard to the column x_i, which is expected to work like a stochastic gradient-descent algorithm. Furthermore, this paper utilizes the parameter β= maxiε⁽ⁱ⁾ because the update schemes (18) yield limit points for Z_i and d_i,j only once for each $ i\in \mathcal {I} $ from Lemma 2. Thus, this paper proposes a heuristic algorithm for reducing the calculation time, as shown in Algorithm 2. There, the parameters satisfy 1>α⁽⁰⁾≥α⁽¹⁾≥⋯≥α_min>0 for k=0,1,⋯,k_max and δ>0, just as in [5].

We consider here the time and space complexities of Algorithm 2. The major computational cost of Algorithm 2 is derived from computing the singular value decomposition of $\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} $ for all i=1,2,⋯,N at each iteration. For simplicity, this paper assumes that the number of non-zero vectors of $\left (\boldsymbol {X}-\boldsymbol {x}_{i}\boldsymbol {1}_{N}^{T}\right)\boldsymbol {D}_{i} $ is M for each iteration and each i. Then, since the algorithm requires the singular value decomposition of the M×M matrix, the time and space complexities of Algorithm 2 are O(M³N) and O(M²) for each iteration. As written in [20], since the method VMC [15] requires the time complexity O(N³+MN²) and the space complexity O(N²), the time and space complexities of Algorithm 2 are lower than those of VMC when the numbers of rows M and columns N satisfy M³<N². Hence, Algorithm 2 is effective for datasets such as those used in Section 5.2.

5 Results and discussion

5.1 Synthetic data

This section presents several numerical examples for the matrix completion problem (1). In this section, each ith column of X⁽⁰⁾ is generated by $ \boldsymbol {\mathcal {F}}_{p}: {\mathbb {R}^{r}} \mapsto {\mathbb {R}^{M}} $ with mapping function (3) as:

$$\begin{array}{*{20}l} \boldsymbol{x}^{(0)}_{i}=\boldsymbol{\mathcal{F}}_{p}\left(\boldsymbol{y}^{(0)}_{i}\right)=\boldsymbol{U}_{p}\boldsymbol{\psi}_{p}\left(\boldsymbol{y}^{(0)}_{i}\right). \end{array} $$

(20)

Using $ \boldsymbol {U}_{p}\in {\mathbb {R}^{{M}\times {\binom {r+p}{p}}}} $ and $ \boldsymbol {Y}^{(0)}=\left [\boldsymbol {y}_{1}^{(0)} \ \boldsymbol {y}_{2}^{(0)}\ \cdots \ \boldsymbol {y}_{N}^{(0)}\right ] \in {\mathbb {R}^{{r}\times {N}}}$ generated by an i.i.d. continuous uniform distribution whose supports are [−0.5,0.5] and [−1,1], the elements of Y⁽⁰⁾ are normalized as max|(Y⁽⁰⁾)_i,j|=1. The index set Ω is generated using the Bernoulli distribution with the given probability q, for which an index (i,j) belongs to Ω. This paper uses relative recovery error as:

$$\text{RE [\%]} = \frac{\|\boldsymbol{X}^{(0)}-\boldsymbol{X}\|_{F}}{\|\boldsymbol{X}^{(0)}\|_{F}} \times 100 $$

to evaluate each algorithm. All numerical experiments were run in MATLAB 2017b on a PC with an Intel Core i7 3.1 GHz CPU, 8 GB of RAM, and no swap memory.

This paper applies some low-rank matrix completion algorithms including singular value thresholding (SVT) [1], the fixed-point continuation algorithm (FPCA) [2], the short IRLS-0 (sIRLS-0) method [3], IPMS [5], the nonlinear matrix completion method VMC [15], and the proposed LLRASGD method to several matrix completion problems with M=100,N=4,000, and d=3,5 for (20). A maximum iteration number of k_max=1000 is used for LLRA, IPMS, sIRLS-0, and SVT, and the termination condition is ∥X^(k)−X^(k+1)∥_F/∥X^(k+1)∥_F≤10⁻⁵ for all algorithms. The parameters for LLRASGD and IPMS are given as $ \alpha ^{(k)}=10^{-\frac {4k}{k_{\text {max}}}} $ and δ=10⁻²; those for SVT are $ \tau ^{(k)}=10^{-2}\sigma _{1}^{(k)} $; those for sIRLS-0 and VMC are $ \gamma ^{(k)}=10^{2-\frac {6k}{k_{\text {max}}}} $; those for VMC are p=0.5 and d=3; and those for FPCA are τ=1 and $ \mu ^{(k)}=(0.25)^{k} \geq \bar {\mu }=10^{-8} $. The condition $ \sigma _{l}^{(k)}\geq 10^{-2}\sigma _{1}^{(k)} $ is used to choose r for FPCA in this paper. We set the initial value of {(X)_m,n}_(m,n)∉Ω to 0 for SVT, FPCA, sIRLS-0, and IPMS. The values X and ε_i are estimated using IPMS for VMC and LLRASGD such that the total number satisfying the condition $ \|\boldsymbol {x}_{j}-\boldsymbol {x}_{i}\|_{2}^{2}\leq \epsilon _{i} $ equals 50 with an estimated value of X using IPMS for LLRASGD.

The results are shown in Tables 1, 2, and 3 for q∈{0.2,0.3,0.4} and r∈{2,3,4,5,6}. As can be seen, estimation accuracy of LLRASGD is better than the others for r=5,6,q=0.2,0.3,0.4 and d=3,5,7, and r=3,4,5,6 and q=0.2 especially. Figures 2, 3, and 4 compare all algorithms with q=0.3. In Figs. 2 and 3, the recovery errors of LLRASGD tend not to decay more than other algorithms. From this result, the proposed method is more effective for the case in which the missing rate or the latent dimension is high.

Table 1 Results of the algorithms for problem (1) with p=3 for (20)

Full size table

Table 2 Results of the algorithms for problem (1) with p=5 for (20)

Full size table

Table 3 Results of the algorithms for problem (1) with p=7 for (20)

Full size table

5.2 CMU motion capture data

This paper considers the matrix completion on motion capture data, which consists of time-series trajectories of human motions such as running and jumping. Similar to [15], this paper uses the trial #6 of subject #56 of the CMU motion capture dataset. The data has measurements from M=62 sensors at 6784 time instants, which the data matrix is known as high-rank matrix. In this experiment, the sequence is downsampled by factor 2, which the data matrix has M=62 rows and N=3392 columns. Then, the elements of the data matrix were randomly observed with the ratio q∈{0.1,0.2,0.3,0.4}, and this paper applied the matrix completion algorithms with the same parameters which is used in the Section 5.1.

The average recovery errors for 10 trials are shown in Fig. 5. Similar to the results on synthetic data, the estimation accuracy of LLRASGD is better than the others. Especially, the recovery errors of LLRASGD are much lower than the others when the missing ratio is very high (such as q=0.1,0.2). From these results, the proposed method is more effective for not only synthetic data but also real-world dataset.

The average computational time costs for all observed ratio q∈{0.1,0.2,0.3,0.4} are shown in Table 4. This result indicates that the computation time of LLRASGD is about 200 to 500 times longer than that of the conventional MRM methods, and the computation time of VMC is about 2.4 times longer than that of LLRASGD for the same number of iterations. However, VMC and LLRASGD have sufficiently high estimation accuracy even with a small number of iterations. Figure 6 shows the results of VMC and LLRASGD for the maximum iteration number of k_max∈{10,20,40,100,1000} in the observed ratio q=0.2. As can be seen in Fig. 6, the recovery error of LLRASGD converges sufficiently in k_max=40. In this maximum iteration number, although the computational time of LLRASGD is about 16 times longer than that of IPMS, the recovery error of LLRASGD is less than half that of IPMS. We can also see that the recovery error of LLRASGD is less than that of VMC for all k_max∈{10,20,40,100,1000}.

Table 4 The average computational time cost (second) of the algorithms for CMU motion capture data recovery

Full size table

6 Conclusion

This paper proposed a local low-rank approach (LLRA) for a matrix-completion problem in which the columns of the matrix belong to an LDDM. The convergence properties of this approach were also presented. The proposed method is based on the idea of tangent hyperplanes of dimension equal to that of the LDDM with respect to each column of the matrix. It is assumed that each hyperplane is of low dimension and that the sum of the rank of each local submatrix with respect to each column belonging to the set of nearest neighborhoods of each column is minimized. Numerical examples show that the proposed algorithm offers higher accuracy for matrix completion than other algorithms in the case where each column vector is given by a pth order polynomial mapping of a latent feature. In particular, the proposed method is suitable when the order p and the dimension of the latent space are high.

7 Appendix

In this section, this paper introduces the truncated nuclear norm and the minimization technique by IPMS [5].

The truncated nuclear norm ∥Z∥_∗,r is defined with the kth biggest singular value σ_k of Z as:

$$\|\boldsymbol{Z}\|_{*,r}=\sum_{k=r+1}^{M} \sigma_{k}. $$

The truncated nuclear norm is used as the substitution function of matrix rank, and the solution of $\frac {1}{2}\|\boldsymbol {Y}-\boldsymbol {Z}\|_{F}^{2} +\gamma \|\boldsymbol {Z}\|_{*,r} $ with a given matrix Y and a given parameter γ>0 can be solved as follows:

$$\begin{array}{*{20}l} \boldsymbol{Z}&=\underset{\boldsymbol{Z}}{\text{argmin}} \frac{1}{2}\|\boldsymbol{Y}-\boldsymbol{Z}\|_{F}^{2} +\gamma\|\boldsymbol{Z}\|_{*,r} \\ &=\boldsymbol{\mathcal{T}}_{r,\gamma}(\boldsymbol{Y}), \end{array} $$

where $\boldsymbol {\mathcal {T}}_{r,\gamma }$ denotes the matrix-shrinkage operator defined as:

$$\begin{array}{*{20}l} \!\!\!\!\!\!\boldsymbol{\mathcal{T}}_{r,\gamma}(\boldsymbol{Y})&=\boldsymbol{U}\text{diag}(\boldsymbol{\sigma}_{r,\gamma})\boldsymbol{V}^{T}\\ \!\!\!\!\!\!\boldsymbol{\sigma}_{r,\gamma}&=[\sigma_{1} \ \cdots \ \sigma_{r} \ (\sigma_{r+1}-\gamma)_{+} \ \cdots \ (\sigma_{M}-\gamma)_{+}]^{T} \end{array} $$

and (c)₊= max(0,c) with regard to the singular value decomposition Y=Udiag(σ)V^T.

In the matrix completion problem (2), the IPMS algorithm solves the relaxation problem by iterating the following update schemes:

$$\begin{array}{*{20}l} \left\{ \begin{array}{ll} \boldsymbol{Z}&\leftarrow\boldsymbol{\mathcal{T}}_{r,\gamma}(\boldsymbol{X}),\\ (\boldsymbol{X})_{m,n} &\leftarrow \left(\boldsymbol{X}^{(0)}\right)_{m,n} \ \text{for} \ (m,n)\in\Omega. \end{array} \right. \end{array} $$

Since the truncated nuclear norm requires the value of r regarding with a matrix rank, the IPMS estimates a matrix rank r during iterations by using the scheme:

$$r\leftarrow \underset{r}{\text{argmin}} \sigma_{r} \ \text{s.t.}\ \sigma_{r} \leq \alpha \sigma_{1}, $$

where 0≤α<1 is a given constant. The details of the IPMS algorithm are written in [5].

Availability of data and materials

Please contact the author for data requests.

Abbreviations

LDDM:: Low-dimensional differentiable manifold
MRM:: Matrix rank minimization
LDLS:: Low-dimensional linear subspace
UoLS:: Union of linear subspaces
VMC:: Variety-based matrix completion
LLRA:: Local low-rank approach
SVT:: Singular value thresholding
FPCA:: Fixed-point continuation algorithm
IRLS:: Iterative reweighted least squares
IPMS:: Iterative partial matrix shrinkage

References

J. F. Cai, E. J. Candés, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim.20(4), 1956–1982 (2010).
Article MathSciNet Google Scholar
D. Goldfarb, S. Ma, Convergence of fixed point continuation algorithms for matrix rank minimization. Found. Comput. Math. 11(2), 183–210 (2011).
Article MathSciNet Google Scholar
K. Mohan, M. Fazel, Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res. (JMLR). 13(1), 3441–3473 (2012).
MathSciNet MATH Google Scholar
D. Zhang, Y. Hu, J. Ye, X. Li, X. He, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Matrix completion by truncated nuclear norm regularization, (2012), pp. 2192–2199.
K. Konishi, K. Uruma, T. Takahashi, T. Furukawa, Iterative partial matrix shrinkage algorithm for matrix rank minimization. Signal Process. 100:, 124–131 (2014).
Article Google Scholar
J. Gotoh, A. Takeda, K. Tono, DC formulations and algorithms for sparse optimization Problems. J. Math. Program. 169(1), 141–176 (2018).
Article MathSciNet Google Scholar
X. Guan, C. T. Li, Y. Guan, Matrix factorization with rating completion: an enhanced SVD model for collaborative filtering recommender systems. IEEE Access. 5:, 27668–27678 (2017).
Article Google Scholar
M. Verhaegen, A. Hansson, N2SID: nuclear norm subspace identification of innovation models. Autom.72:, 57–63 (2016).
Article MathSciNet Google Scholar
K. H. Jin, J. C. Ye, Annihilating filter-based low-rank Hankel matrix approach for image inpainting. IEEE Trans. Image Process.24(11), 3498–3511 (2015).
Article MathSciNet Google Scholar
Q. Zhao, D. Meng, Z. Xu, W. Zuo, Y. Yan, L₁-norm low-rank matrix factorization by variational Bayesian method. IEEE Trans. Neural Netw. Learn. Syst.26(4), 825–839 (2015).
Article MathSciNet Google Scholar
B. Eriksson, L. Balzano, R. Nowak, High rank matrix completion. Int. Conf. Artif. Intell. Stat.22:, 373–381 (2012).
Google Scholar
C. Yang, D. Robinson, R. Vidal, in Proc. of the 32th Int. Conf. on Machine Learning (PMLR), 37. Sparse subspace clustering with missing entries (PMLRLille, 2015), pp. 2463–2472.
Google Scholar
C. -G. Li, R. Vidal, A structured sparse plus structured low-rank framework for subspace clustering and completion. IEEE Trans. Signal Process. 64(24), 6557–6570 (2016).
Article MathSciNet Google Scholar
E. Elhamifar, in Proc. 28th Adv. Neural Inf. Process. Syst. High-rank matrix completion and clustering under selfexpressive models (Curran Associates Inc.Barcelona, 2016), pp. 73–81.
Google Scholar
G. Ongie, R. Willett, R. D. Nowak, L. Balzano, in Proc. of the 34th Int. Conf. on Machine Learning (PMLR), 70. Algebraic variety models for high-rank matrix completion (PMLRSydney, 2017), pp. 2691–2700.
Google Scholar
G. Ongie, L. Balzano, D. Pimentel-Alarcón, R. Willett, R. D. Nowak, Tensor methods for nonlinear matrix completion. arXiv preprint arXiv:1804.10266 (2018). https://arxiv.org/abs/1804.10266.
R. Vidal, Y. Ma, S. Sastry, Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005).
Article Google Scholar
X. Alameda-Pineda, E. Ricci, Y. Yan, N. Sebe, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Recognizing emotions from abstract paintings using non-linear matrix completion (IEEELas Vegas, 2016), pp. 5240–5248.
Google Scholar
J. Fan, T. W. S. Chow, Non-linear matrix completion. Pattern Recogn.77:, 378–394 (2018).
Article Google Scholar
J. Fan, M. Udell, in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. Online high rank matrix completion (IEEELong Beach, 2019), pp. 8682–8690.
Google Scholar
J. Fan, Y. Zhang, M. Udell, in Proc. AAAI, 34. Polynomial matrix completion for missing data imputation and transductive learning,” (AAAI PressNew York, 2020), pp. 3842–3849.
Google Scholar
J. Fan, J. Cheng, Matrix completion by deep matrix factorization. Neural Netw.98:, 34–41 (2018).
Article Google Scholar
B. Schölkopf, A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (MIT Press, Cambridge, 2002). https://books.google.co.jp/books?id=y8ORL3DWt4sC&hl=ja&source=gbs_book_other_versions.
Google Scholar
S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Sci.290(5500), 2323–2326 (2000).
Article Google Scholar
M. Winlaw, D. L. Samimi, A. Ghodsi, Robust locally linear embedding using penalty functions. Int. Joint Conf. Neural Netw., 2305–2312 (2011).

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions that helped improve the quality of this manuscript.

Funding

This work was supported by the JSPS KAKENHI Grant Number JP19H02163.

Author information

Authors and Affiliations

Faculty of Computer and Information Science, Hosei University, 3-7-2 Kajino-cho Koganei-shi, Tokyo, Japan
Ryohei Sasaki & Katsumi Konishi
Faculty of Information Science and Technology, Tokai University, 4-1-1 Kitakaname Hiratsuka-shi, Kanagawa, Japan
Tomohiro Takahashi
Faculty of Engineering, Tokyo University of Science, 6-3-1 Niijuku Katsushika-ku, Tokyo, Japan
Toshihiro Furukawa

Authors

Ryohei Sasaki
View author publications
You can also search for this author in PubMed Google Scholar
Katsumi Konishi
View author publications
You can also search for this author in PubMed Google Scholar
Tomohiro Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiro Furukawa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have contributed equally. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Ryohei Sasaki.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sasaki, R., Konishi, K., Takahashi, T. et al. Local low-rank approach to nonlinear matrix completion. EURASIP J. Adv. Signal Process. 2021, 11 (2021). https://doi.org/10.1186/s13634-021-00717-7

Download citation

Received: 19 August 2020
Accepted: 20 January 2021
Published: 12 February 2021
DOI: https://doi.org/10.1186/s13634-021-00717-7

Local low-rank approach to nonlinear matrix completion

Abstract

1 Introduction

2 Related works

2.1 Matrix rank minimization for linear subspace

2.2 High-rank matrix completion with the kernel method

3 Methods

3.1 Local low-dimensional model

3.2 Local low-rank approximation algorithm

3.3 Truncated nuclear norm-minimization approach

4 Convergence analysis

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Theorem 1

Proof

5 Results and discussion

5.1 Synthetic data

5.2 CMU motion capture data

6 Conclusion

7 Appendix

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords