Time‑varying graph learning from smooth and stationary graph signals with hidden nodes

,

. So far, a significant amount of literature has been proposed to learn graph topology, which is summarized in [11][12][13].Particularly, the GSP viewpoint provides a new technique for inferring the graph topology from a set of observations.In general, the GSP-based approaches can generally be categorized into three main groups.The first category of approaches makes assumption about the graph by enforcing properties such as smoothness or sparsity of the graph signals [14][15][16].Instead of smoothness/sparsitybased approaches, the second category of approaches assumes that the graph signals are generated from a Laplacian constrained Gaussian-Markov random field (GMRF) [17,18].The third category of approaches exploits diffusion model [9,19] to learn graph topology.The model considers that the observed signals are the outcome of a diffusion process on the graph, where each node continuously influences its neighborhoods.
It should be noted that all the aforementioned graph topology inference works focus on the scenario where observations from all the nodes are available.However, in numerous pertinent scenarios, the observed graph signals may correspond only to a subset of the original graph nodes, while the rest graph nodes are hidden.Neglecting these hidden nodes can drastically hinder the performance of graph topology inference methods.Consequently, some recent works have begun to address this related issue, including Gaussian graphical model selection [20][21][22], linear Bayesian model [23], and nonlinear regression [24].More recently, the problem of constructing a graph when consider the existence of hidden nodes has been investigated within the context of GSP [25][26][27].Notably, two related works have proposed leveraging, respectively, smoothness prior [5] or stationarity prior [28,29] to infer graph topologies from incomplete data [25,26].Another work has focused on estimating multi-layer graphs in the presence of hidden nodes, assuming that the observed graph signals follow a GMRF model [27].The existing three GSP-based graph learning methods with hidden nodes are limited to learning static graph or multi-layer graphs.
It came to our attention that some scenarios involve the consideration of time-varying generation models to capture the relationships between data variables in the real world.For example, this is observed when estimating the time-varying brain functional connectivity using electroencephalography recordings (EEGs) or resting-state functional magnetic resonance imaging (fMRI) [30].Additionally, identifying temporal transitions in biological networks, such as protein, RNA, and DNA [31], and inferencing relationships between stock trading from financial market data [32] also exhibit the time-varying nature.To address the growing demand for understanding these time-varying graph structures, several approaches have been proposed.These approaches have leveraged prior assumptions about the evolutionary patterns of time-varying graphs to tackle the problem of learning their topology.In a recent study [33], the authors have proposed an efficient method for inferring time-varying topology.They have utilized Tikhonov regularization to ensure smooth temporal changes in edge weights, thereby capturing the evolving nature of the graphs.To apply additional constraints on the sparse changes of graph, the authors in [34] have introduced a l 1 regularization term for graph varia- tion.Additionally, another time-varying graph learning work has been described in [35], where the authors have proposed to extend the graphical Lasso [36] to account for the temporal variability.While these works in [33][34][35] did not explicitly incorporate scenarios involving hidden nodes, they served as inspiration for our research.We recognize the significance of collecting observations from related graphs and leveraging information across time-varying graphs to address the challenge of hidden nodes.However, it remains uncertain how existing algorithms can be adapted to measure graph similarity between unobserved nodes.Consequently, modeling the influence of hidden nodes in the context of time-varying graph learning becomes crucial.For a summary of the proposed method and related graph learning methods, please refer to Table 1.
Building on the preceding discussion, the primary objective of this paper is to address the inference problem of time-varying graphs with the presence of hidden nodes.Our two primary contributions are formulating this problem as a convex optimization problem and devising an algorithm to effectively solve it.Our method is predicated on the assumption that the observed signals exhibit simultaneous smoothness and stationarity on the given graphs.While this assumption has proven successful in the riled of static graph inference, a robust formula for time-varying graph learning with the hidden nodes has not yet been established.To fill this gap, it is necessary to modify the classic interpretations of smoothness prior and stationarity prior, in order to account for the impact of hidden nodes.We first adopt a block matrix factorization methodology to revise the smooth and stationary assumptions.Then, we exploit the inherent low-rank and sparse patterns within the blocks associated with hidden nodes.The patterns enable the smooth evolution of graph edges, thereby capturing the temporal dynamics across the sequence of graphs.Furthermore, to fully leverage the characteristics of time-varying graphs, it is crucial to capture the similarity among graphs, accounting for both the observed and hidden nodes.This is achieved through utilization of a similar column-sparsity pattern, which emerges from the similarity analysis of each time slot graph.We test the proposed approach on synthetic and real-world data.Experimental results show that the effectiveness of our proposed approach.
The remainder of this paper is structured as follows.Section 2 provides a comprehensive review of fundamental concepts related to signals defined over graphs, as well as an overview of the associated graph learning methods.Section 3 introduces a time-varying graph learning problem with hidden nodes at hand.Section 4 proposes optimization frameworks to solve this problem.Section 5 is dedicated to the evaluation of the performance of our proposed method.Finally, we conclude the paper in Sect.6.

Notations:
The following notations will be used in this paper.All the vectors are denoted by boldface lower case letters, and the matrices by boldface upper case letters.We use calligraphic font capital letters to denote sets.R N ×N denotes the set of matrices Signal smoothness and stationarity model TVGL-TS [33] Signal smoothness model -TVGL-SV [34] Signal smoothness model -TVGL [35] Graphical lasso-based model -Multi-layer graphs PGL [27] Signal stationarity model of size N × N with nonnegative.For vector x , E[x] represents the expected value of x .
For matrix X , ||X|| F represents the Frobenius norm, X 0 represents the l 0 norm, �X�| F,off is the Frobenius norm of X that does not include the diagonal elements, �X� * represents the nuclear norm, X 2,1 represents the l 2,1 norm and can be understood as a two-step process where one first obtains the l 2 norm of each of the matrix X , then, the l 1 norm of the resulting row vector is computed.Moreover, diag(•) is a diagonal matrix with its argument along the main diagonal, tr(•) is the trace of the matrix, 1 stands for all-one vectors and I stands for the identity matrix.Finally, the minimization operator, the trans- pose and pseudo-inverse denoted by min , superscript ⊤ and superscript † , respectively.

Preliminaries
In this section, we first outline some basic GSP definitions.Then, we provide a concise overview of two pivotal models of graph signals, namely smooth graph signals and stationary graph signals.Building on these insights, antecedent works of graph learning problem based on these two graph signal models are introduced.Finally, a general framework for learning time-varying graphs is briefly reviewed.

Basic definitions for GSP
An undirected and weighted graph G = (V, E, W) with N nodes are considered here, where V = {1, . . ., N } represents the set of nodes, E ⊆ V × V is the set of edges.The weighted adjacency matrix W ∈ R N ×N is a symmetric matrix, each element of the matrix characterizes the strength of the connection.We also assume that there are no self-loops or directed edge in the graph, which implies diag(W) = 0 .The (i, j)-th entry W ij of the adjacency matrix is assigned a nonnegative value if (i, j) ∈ E , i and j represent two nodes.We utilize a vector x = [x 1 , . . ., x N ] ⊤ ∈ R N to represent graph signals, where x i denotes the value measured at the node i.
In graph theory, the graph Laplacian L is defined as L := D − W .The degree matrix D is a diagonal matrix that contains the degrees of the nodes along diagonal with entries D ii = N j=1 W ij and D ij = 0 for i = j .The matrix L can be decomposed into L = U U ⊤ due to its symmetry, where U = [u 1 , . . ., u N ] ∈ C N ×N is a matrix consisting of the eigen- vectors of L , and ∈ C N ×N is a diagonal matrix containing the corresponding eigen- values arranged in increasing order.The graph shift operator (GSO) is a N × N square matrix S that captures the underlying topology of graph.The entries of S , denoted as S ij , can be only nonzero if i = j or there exists an edge (i, j) ∈ E in the graph.The adjacency matrix [37] and the Laplacians [15] are selected as popular options for the GSO.Without loss of generality, S possesses a complete set of orthonormal eigenvectors and associated eigenvalues.

Smooth graph signals
In the node domain, the smoothness of graph signals refers to the tendency for the values of graph signals associated with the two end nodes of edges with significant weights in the graph to exhibit similarity.Typically, in the field of GSP, the total variation (TV) of graph signals x is commonly interpreted as a smoothness measure, quantified through a quadratic form [5] Intuitively, graph signals x are said to be smooth when the Laplacian quadratic form TV(x) is small.In particular, the smaller the values of TV(x) , the smoother the graph signals.
When comes to graph learning problem, the smooth property is widely used as a prior information.Considering the matrix X = [x 1 , . . ., x K ] contains K observations, a general graph learning framework is proposed in the works of [14,15] The penalty function ) is employed to prevent the acquisition of a trivial solution and controls the sparsity of the learned graph.Parameters α and β are constants.The term log(diag(L)) is a two-step process.Firstly, the pro- cess obtains the diagonal elements of matrix L using the diag operation, and then the log operation is applied to the resulting row vector.Therefore, log is an element-wise operation.The learned Laplacian matrix has to be in the valid combinatorial Laplacians set, by defining L := {L ij ≤ 0, i � = j; L = L ⊤ ; L1 = 0; L ≻ 0} .This constraint emphasizes that L is a symmetric positive semidefinite matrix.The smoothness of all observed signals over the selected graph is quantified by the first term of equation (2).

Stationary graph signals
Given an undirected graph G , obviously, GSO S is symmetric matrix.A linear shift-invariant graph filter H ∈ R N ×N can be written as H = L−1 l=0 h l S l , where h = [h 0 , . . ., h L−1 ] ⊤ is a vector composed of the graph filter coefficients and L − 1 denotes the filter degree.Since H is a polynomial of S , it readily follows that the matrix H is also symmetric.For a given input signal s , the output of the graph filter is simply defined as x = Hs .Assuming that the s is a white signal follows a normalized i.i.d Gaussian distribution with mean zero, the output of filter H is stationary on the graph.This is because the following properties are satisfied where m x denotes the expected value and C represents the covariance matrix of the graph signals x .Moreover, since G is undirected, both S and C are symmetric.It becomes apparent that the two diagonalizable matrices GSO S and C share common eigenvectors U in the spectral domain [38] from (3).As a result, we also have that the matrices S and C commute.
Thus, the task of learning underlying graph from stationary graph signals is equivalent to inferring its shift operator S .To be more precise, under the general assump- tion of graph sparsity, the graph learning problem from stationary graph signals can be formulated as [12] (1) (2) (3) where the set S enforces the estimated matrix S to satisfy some specify properties.� • � 0 promotes sparse solutions of the matrix S .The equality constraint enforces that commu- tativity of the Laplacian and the covariance matrix.

Time-varying graph learning
Time-varying graph learning will learn a series of graphs L (1) , . . ., L (T ) using the graph sig- nals X (1) , . . ., X (T ) collected during T time periods, where In this case, the selection of slowly changing time-varying graphs can be accomplished by solving [33] where the term f (L (t) − L (t−1) ) denotes a regularization term that captures the tempo- ral change in graph edges.The parameter η controls the temporal sparseness.

Time-varying graph learning with hidden nodes
In this section, we consider situations where the graph signals are observed only from a subset of nodes during the data collection process.Specifically, Sect.3.1 involves analysis of latent nodal variables and their influence on the time-varying graph.This is accomplished through the utilization of a block matrix factorization methodology to represent the original matrices.Subsequently, we describe the time-varying graph topology inference problem with hidden nodes, as outlined in Sect.3.2.

Time-varying graph model with hidden nodes
Formally, we consider a sequence of graph signals that are partitioned into non-overlapping time windows {X (1) , . . ., X (T ) } .In this paper, we consider an observation model with hidden nodes where the observed graph signals correspond to a subset of X (t) , while the values of graph signal residing on the remaining nodes have never been observed.We partition the set of nodes V into disjoint subsets O and H , where O is the set of observable nodes and H is the set of hidden nodes with H = V\O .In particular, we set O = {1, . . ., O} with cardinality |O| = O and O associated with the first O rows of X (t) .As described in Sect.2, the sample covariance matrices and GSO corresponding to the observed graph signals are given by Ĉ(t) O and S (t) O , respectively.To this end, for each time slot graph, the matrices S (t) and Ĉ(t) can be described by block structure as Here, the submatrices S (t)  O , S (t) OH , S (t) H specify the dependencies among the observed nodes, between the observed and hidden nodes and among the hidden nodes, respectively.(5) min In particular, the sample covariance of the observed graph signals is represented by The undirected graphs follow that S (t) = (S (t) ) ⊤ and Ĉ(t) = ( Ĉ(t) ) ⊤ due to both matrices S (t) and Ĉ(t) are symmetric.Similarly, the submatrices S (t)  OH and Ĉ(t) OH also exhibit the property of symmetry.
As we can see, the block structure of the matrices S (t) and Ĉ(t) in ( 6) motivates the search for optimal time-varying graphs when consider the existence of hidden nodes.Next, the problem of time-varying graph learning with hidden nodes will be introduced.

Problem statement
Given the known nodal subset O ⊂ V , and the matrices {X (t)  O } T t=1 collect the graph signal values of observed nodes arising from unknown time-varying graphs {G (t) } T t=1 .Our objective is to learn the time-varying graph while accounting for the presence of hidden nodes, which is tantamount to learn the GSO sequence 1.The number of hidden nodes far less than the number of observed nodes with cardinality H ≪ O; 2. The full observations {X (t) } T t=1 satisfy the prior assumption that they are smooth and stationary in S (t) simultaneously; 3. The number of graph edges permitted to change between consecutive graphs is limited according to a particular function ψ(S (t) − S (t−1) ) , a prior that graph edges change smoothly in time.
The task of learning time-varying graphs encoded in the matrices {S (t) O } T t=1 presents a challenging problem due to the absence of observations from nodes in set H .To address this problem, the above three assumptions are made to render the problem more tractable.Firstly, the assumption (1) guarantees the availability of information for the majority of nodes.Secondly, the assumption (2) establishes a well-defined relationship between the graph signals and the unknown time-varying graphs.Lastly, the assumption (3) enforces that the graph edges change smoothly over time, providing temporal relations that may exist in time-varying graphs.

Proposed optimization framework
In this section, the influence of hidden nodes on smoothness prior and stationarity prior is presented, respectively.Following this, an optimization framework is designed to address the time-varying graph learning problem with hidden nodes, considering the scenario where the observed graph signals are both smooth and stationary.

Influence of hidden nodes on smoothness prior
The smoothness of signals on time-varying graphs can be computed as 1 K tr (X (t) ) ⊤ L (t) X (t) .In this part, we focus on Ĉ(t) = 1 K X (t) (X (t) ) ⊤ , and thus, the TV of graph signals is equal to tr( Ĉ(t) L (t) ) .However, the existence of hidden nodes restricts our access solely to the observed sampled covariance matrices Ĉ(t) O .Regarding the block structure of matrices Ĉ(t) and S (t) defined in (6), the smoothness of signals within the context of time-varying graphs can be rewritten as where the matrices H ∈ R H ×H , and r (t) := tr(R (t) ) are nonnegative variables.The first equation in (7) represents the block- wise smoothness of graph signals.However, we do not have knowledge of most of the submatrices related to the hidden nodes.By lifting the matrices P (t) and R (t) , we circum- vent this challenge and solve the time-varying topology inference as a convex problem.
Notice that the matrices which are different from the set of valid combinatorial Laplacians L .The only difference between the two set is to replace the condition L1 = 0 with L1 ≥ 0 , while others remain unchanged.The existence of links between the elements in O and the elements in H gives rise to nonzero (negative) entries in L (t)  OH and, as a result, the sum of the off-diagonal elements of can be smaller than the value of the associated diagonal elements (which account for the links in both O and H ). Therefore, L (t)  O is not a combinatorial Laplacian.Indeed, we encounter the challenge that L (t)  O are not Laplacians themselves, while tackling the time-varying graph topology inference from smooth observations with hidden nodes.In order to circumvent this challenging issue, we turn to estimating O represent the adjacency matrices of the observed graph signals in the t th time slot.With this consideration in mind, the matrices L(t) O are proper combinatorial Laplacians satisfy the conditions for the valid set of graph Laplacians L .We formulate the relation between L (t)  O and OH .We use degree matrices D (t)  OH to represent the edges existing between the observed and hidden nodes, which is defined as D (t)  OH := diag(A (t) OH 1) ∈ R O×H .By leveraging the matrices L(t) O , we take the place of the smoothness penalty in (7) as where P(t) := Ĉ(t) O D (t) OH /2 + P (t) .With the assumption (1), it is obvious that the matrices P (t) are low-rank matrices with rank(P (t) )≤ H ≪ O .Furthermore, the matrices D (t)  OH are low-rank matrices, if the graphs are sparse.Thus, it can be inferred that P(t) also exhibits a low-rank structure.

Influence of hidden nodes on stationarity prior
Upon evaluating the impact of the smoothness assumption on the time-varying graph learning problem involving hidden nodes, we proceed to consider that the graph signals to be stationary over the whole graphs.This graph signals model leads us to the conclusion that the eigenvectors of C (t) and S (t) are identical, thereby the equation C (t) S (t) = S (t) C (t) holds.To this end, we leverage the block structure of matrices C (t)  and S (t) , with a specific focus on the upper left block on both sides of the equation C (t) S (t) = S (t) C (t) , to model the impact of hidden nodes on the stationarity prior (7) tr( Ĉ(t) Equation ( 9) reveals that we can't simply focus on O when the hidden nodes are presented, but also need to notice that the associate terms C (t)  OH (S (t) OH ) ⊤ and S (t) OH (C (t) OH ) ⊤ .Furthermore, we set the matrices P(t) = C (t) OH (S (t) OH ) ⊤ , similar to the defini- tion of P (t) .The key distinction lies in our utilization of the matrices S (t)  OH instead of the Laplacians L (t)  OH to associate the matrices P(t) .Under this setting, equation ( 9) can be formulated as Similar to the analysis of P(t) in section 4.1, the matrices P(t) are also low-rank matrices.We will exploit the low-rank structure of the matrices P(t) and P(t) in our formulation.

Smoothness prior versus stationarity prior
Supposing that we are given with two datasets, X 1 and X 2 , each containing an equal num- ber of graph signals.Specifically, we known that the signals in X 1 exhibit smoothness char- acteristics on the graph, and another set of the signals in X 2 are stationary on the graph.Based on this information, we are able to identify the underlying graph.It is of interest to see which one leads to a better graph topology inference result.Without loss of generality, graph smoothness is a more lenient assumption that limits the TV of the observed values of the graph signal to be small.However, graph stationary outperforms the smoothness-based method, as it has a much better prior assumption with significantly restricts the GSO.In the meantime, there may arise an instance where the observations are both stationary and smooth on the graph.More precisely, it means that the covariance matrix of the observations is diagonalized by eigenvectors with the graph Laplacian and the graph signals is low-frequency-based.In such settings, two graph recovery methods can be combined to enhance recovery performance, which will be explored in the subsequent subsection.

Topology inference based on smoothness prior and stationarity prior
Taking the assumption (3) into account, the task of learning time-varying graph with the presence of hidden nodes involves acquiring knowledge of a sequence of graphs {S (t)  O } T t=1 from the observed graph signals X (1)  O , . . ., X (T ) O collected during T time periods.The task specifically concentrates on the scenario where the GSO is represented by the Laplacian matrix, namely, our ultimate target corresponds to infer {L (t)  O } T t=1 .We assume that the observed signals exhibit both smoothness and stationarity characteristics on unknown time-varying graphs.As a result, the smoothness penalty described in (8) and the commutativity constraint accounting for stationary equation in (10) can be jointly considered to approach time-varying graph learning problem.More specifically, this problem can be formulated as the ensuing objective function (9) where γ * and η are tuning parameters.The term � L(t) O � 2 F ,off offers a handle on the level of sparsity.The penalty function ψ(•) imposes a constraint that limits the number of edge changes between consecutive graphs to a small value and we set function The first constraint ensures that the TV of graph signals is non-negative.The equality constraint enforces that commutativity of the Laplacians and the covariance matrices when consider the presence of hidden nodes.The two rank constraints capture the fact that the low rankness property of P(t) and P(t) .
In most instances, it is not feasible to obtain the entire covariance C (t) O .Therefore, we resort to relying on the sample covariance matrices Ĉ(t) O and relax the stationary constrain to guarantee that the left and right terms of the original equation ( 10) are roughly equivalent, though not entirely so.It is worth noting that under this more lenient condition, P(t) and P (t) are equivalent.In such circumstances, we focus on rank( P(t) ) only and exploit the nuclear norm to capture the low-rank structure of matrices P(t) .To this end, we reformulate the optimization objective function (11) as where the nuclear norm penalty � P(t) � * is employed to encourage low-rank solutions by favoring matrices with sparse singular values.The non-negative constant ǫ is an essen- tial parameter that characterizes the accuracy of the sample covariance.The value of the parameter under consideration is inherently related to the amount of noise and the total number of samples K.This value is used as an indicator of the accuracy and faithfulness of the estimated covariance.
Based on previous analysis, the matrices P(t) are not only inseparable from the product of the matrices Ĉ(t) O and D (t)  OH but also related to the matrices P (t) .Recalling that the diagonal of D (t)  OH are sparse, it is obviously that OH are the matrices with (11) min several zero columns.More precisely, the assumption (1) reveals the presence of a column-sparse structure in the matrices P(t) .However, the rank constraint fails to preserve the desired columns sparsity characteristic.Following the classical approach in the literature, an efficient way to circumvent this issue is to replace the nuclear norm with the group Lasso penalty.This penalty not only reduces the number of nonzero columns but also promotes solutions with a low rankness.
To further improve the performance of graph topology inference, we consider leveraging the aforementioned column-sparsity regularization.Taking this consideration into account, a convex optimization problem for solving the time-varying graph learning with hidden nodes is proposed The assumption (3) guarantees the similarity of temporally adjacent graphs.To exploit this property, we construct a tall matrix consisting of the matrices P(t) and P(t−1) .By applying the l 2,1 norm to the tall matrix, we are able to capture and preserve the desired column-sparsity characteristic.This approach allows us to effectively leverage the temporal similarity between adjacent graphs, ensuring that columns with nonzero entries are likely to be consistently positioned across the varying matrices P(t) .This is particu- larly important to consider this additional structure, which is helpful to improve the estimation of P(t) and result in a better recovery performance of L(t) O .The effectiveness of the formulation (13) in promoting the desired column-sparsity pattern is demonstrated through the experimental results in Sect.5.3.
We solve the optimization problem ( 13) by adopting an alternating minimization scheme.To find a numerically efficient solution, we decouple (13) into three simpler optimization problems.Specifically, with m = 0, . . ., M − 1 being the iteration index, we initialize two variables P(t),(m) and r (t), (m) .At the first step, for the given P(t),(m) and r (t),(m) , we solve the following optimization problem with respect to At the second step, we fix r (t),(m) and leverage the estimate O from the previous step to optimize the objective function with respect to P(t),(m+1) , which leads to the fol- lowing optimization problem Algorithm 1 Time-varying graph learning method with hidden nodes At the last step, according to the L(t),( O and P(t),(m+1) obtained in the first two steps, we solve the following convex optimization problem with respect to r (t),(m+1) (14) P(t),(m+1) := min ) + 2tr( P(t) ) + r (t),(m) ≥ 0. (16) ) + 2tr( P(t),(m+1) ) + r (t) ≥ 0.
We alternate between the three steps outlined in ( 14), ( 15) and ( 16) to obtain the final solution for the optimization problem described in (13).We generally observe convergence within a few iterations.The algorithm is summarized in Algorithm 1.

Numerical experiments
In this section, we present some numerical results validating the effectiveness of the proposed time-varying graph learning method for both synthetic and real-world data.
The proposed method (hereinafter called TGSm-St-GL ) is compared with benchmark- ing methods, including static graph learning from smooth and stationary graph signals with hidden nodes (GSm-St-GL) [26], time-varying graph learning method based on temporal smoothness (TVGL-TS) [33] and time-varying graph learning method based on sparse variation (TVGL-SV) [34].We commence with an introduction of the general experimental settings.Next, we assess the efficacy of our method using synthetic data and conduct a comparative analysis of our method against established classical methods.Finally, we introduce the simulation performed over one real-world data.In our experiments, we solve optimization problems using CVX [39], which is a package for solving convex programs.

Evaluation metrics
We employ five evaluation metrics to access the performance of our proposed method.
The first evaluation metric is Mean Error that measures the estimation accuracy of recovered graphs.The Mean Error is defined as � O are the estimated Laplacian matrices and (L (t) O ) * are the ground truth.Additionally, we utilize three metrics, namely Precision, Recall and Fscore, to evaluate how effectively the true edge structure of the graph is captured.More precisely, the Fscore provides a measure of the accuracy in estimating the graph topology, which is closely related to the metrics Precision and Recall.The Fscore ranges between 0 and 1, with higher values indicating better performance in capturing the graph topology.The mutual dependence between the obtained edge set and the ground-truth graph is measured by the last evaluation metric Normalized Mutual Information (NMI).

Baseline methods
We discuss various related graph learning strategies to compare with our proposed method.The first is GSm-St-GL, stands for the static graph learning method.This method considers the existence of hidden nodes and assumes that the entire graph signals exhibit both smoothness and stationarity simultaneously.The second is TGSm-St-GL-nh, a time-varying graph inference method that aims to address the same problem as TGSm-St-GL but ignores the presence of hidden nodes.The other two baseline timevarying graph learning schemes are TVGL-TS and TVGL-SV.These two schemes are applicable when all nodes are available and the graph signals satisfy the stationarity assumption.

Synthetic data
We create a type of synthetic graph signals generated from a time-varying Erdös-Rényi graph (abbreviated as TV-ER graph).The process of constructing the data involves two steps: firstly, the creation of the TV-ER graph; secondly, the generation of time-varying graph signals by utilizing probability distributions based on the graph Laplacians of the aforementioned TV-ER graph.

Time-varying graph construction
In general, the generation of a TV-ER graph involves two steps.At the first step, an initial static ER graph G (1) is constructed.The graph G (1) consists of N = 20 nodes, and we set the edge connection probability s = 0.3 .At the second step, we change the connections of edges in the original ER graph over time to construct the TV-ER graph.The t th graph G (1) is obtained by resampling 10% of edges from the previous graph G (t−1) .In this way, we construct a set of graphs, such that only a few edges switch at a time while most of the edges remain unchanged, i.e., the set of graphs follow the assumption (3).The edge weights of graphs belong to the set {0, 1}.

Generating synthetic graph signals
We generate time-varying graph signals by utilizing distributions derived from the graph Laplacians of the TV-ER graph that we construct.Let L (t) represents the graph Laplacian of a graph at a certain time slot t, we can write its eigendecomposition as L (t) = U (t) � (t) (U (t) ) ⊤ .We create the smooth graph signals as X (t) = U (t) Z (t) with K = 50 , where Z (t) ∼ N (0, (� (t) ) † ) .It is worth mentioning that the covariance of X (t) is represented as C (t) = ((L (t) ) † ) 2 .Hence, the graph signals generated from this model satisfy the assumption of being both smooth and stationary on the time-varying graphs.

Results on synthetic data
We conduct several experiments to investigate the behavior of our proposed method and the other baseline methods on synthetic data.Different settings are considered in these experiments, including the number of hidden nodes, the noise level, and the column-sparse structure of matrices P(t) .

Number of hidden nodes
We assess the performance of each method by varying the number of hidden nodes and set H = {1, 2, 3, 4, 5} .We select the hidden nodes from all nodes in the graph by random selection.
The results in Fig. 1 show that the performance comparisons for different number of hidden nodes based on the TV-ER graph.The Mean Error of recovered graphs and variation of the Fscore are reported in Fig. 1a, b, respectively.It can be seen that the Mean Error increases with the growing H and the Fscore decreases with the growing H for TGSm-St-GL .This observation highlights the significant influence of hidden nodes on time-varying graph recovery.The comparison depicted in Fig. 1 further supports the conclusion that the proposed method outperforms TGSm-St-GL-nh .This is because TGSm-St-GL-nh ignores the presence of hidden nodes.As the same time, TVGL-TS and TVGL-SV present the worst performance since that these two methods not only ignore the presence of hidden nodes but also only account for smoothness assumption for the graph signals.TGSm-St-GL outperforms GSm-St-GL because the latter lacks consideration of the temporal relationship between graphs.

Noisy observations
The effect of different noise levels is evaluated in the second experiment.We use TV-ER model with edge probability values of s = 0.3 to generate random time-vary- ing graphs and set H = 2 .Assuming that the ground-truth graph signals X (t)  O are corrupted by a multivariate Gaussian distribution noise with mean zero and covariance σ 2 I , resulting in the observed noise graph signals X(t) O .As depicted in Fig. 2, the Fscore of the learned graphs is plotted on the y-axis, while the power of noise is represented on the x-axis.Notably, TGSm-St-GL demonstrates superior performance compared to GSm-St-GL .This finding is consistent with the previous experimental results.Besides, compared to TGSm-St-GL , we observe that the performance of TGSm-St-nh , TVGL-TS and TVGL-SV deteriorates significantly with increase in noise power, further emphasizing the necessity of considering the existence of hidden nodes.Furthermore, the result of TGSm-St-GL in terms of Fscore decays slightly when the noise power increases, demonstrating the proposed method is robust to noise.

Structure properties of P(t)
Although the primary objective of this study is to achieve the recovery of { L(t) O } T t=1 , the structure properties of { P(t) } T t=1 make a significantly contribution to our proposed method at the same time.Consequently, illustrate the recovered { L(t) O } T t=1 and { P(t) } T t=1 is the purpose of this experiment.In this way, we can gain a clearer understanding of the impact of different methods on graph structure recovery.The outcomes are depicted in Figs. 3 and 4. In Fig. 3, the first column corresponds to ground-truth graph topology and the corresponding covariance matrix separately.The second column corresponds to the ground-truth values of L(1) O and P(1) .The last two columns present the estimates obtained by the group Lasso scheme TGSm-St-GL [cf.(13)] and the low-rank scheme TGSm-St [cf.(12)], respectively.It is apparent that for the depicted example, the low- rank scheme TGSm-St is not capable of recovering the column-sparse structure of the original matrix P(1) .On the contrary, the estimated matrix P(1) in Fig. 3g exhibits sim- ilar column sparsity as the ground truth P(1) .Significantly, from the perspective of the estimated L(1) O , it becomes evident that the more precise estimation of P(1) leads to a superior inference of the graph topology.Thus, the group Lasso scheme TGSm-St-GL yielding better estimates than the low-rank scheme In Fig. 4, we show that the learned matrices P(t) of time-varying graph, where the value of t range from 1 to 4. It is apparent that the columns with nonzero entries maintain consistent positions across adjacent time slots for the learned matrices O and P(1) generated by (13) are represented in panels (c) and (g), those generated by (12) in panels (d) and (h) P(t−1) .In other words, the scheme TGSm-St-GL captures the similar column-sparsity pattern of P(t) resulted from the temporal similarity of time-varying graph.

Experiments on real-world data
In this section, we evaluate our algorithms using two real-world data and compare their recovery performance with existing alternatives in the literature.

Application to PM 2.5 data
We start by considering the daily mean PM 2.5 concentration data from California provided by the US Environmental Protection Agency [40].The dataset contains daily measurements collected from 93 sensors in California over the initial 304 days of 2015.According to the longitude and latitude coordinates of these 93 sensors, we build an initial graph.To infer best-represented time-varying graphs from incomplete graph signals, we make the assumption that only the 15 first sensors are observed.In this case, the goal is to infer the connections between those 15 sensors.Moreover, we divide 304 days into 10 time slots in equal proportion, and thus, each sensor includes data from 30 days, i.e., X (t) ∈ R 15×30 .
The comparative outcomes between the proposed approach and other relevant alternatives are shown in Table 2.We notice that the proposed TGSm-St-GL obtains the higher Fscore 0.5011 and the higher NMI than the other methods.

Application to COVID-19 data
Finally, we use the global COVID-19 dataset provided by the Johns Hopkins University [41].The dataset contains the cumulative number of COVID-19 cases for each day and each locality between January 22 and April 6, as well as the geographical localization of 259 places including some regions of the world, i.e., overall the dataset has 7 time slots with N = 259 and M = 10.As we want to take into account the presence of hidden variables, we are going to assume that O = 1, . . ., 30 , so that only the 30 first stations are observed, with our goal being inferring the connections between those stations.The results are listed in Table 2.We observe that the proposed TGSm-St-GL out- performs GSm-St-GL , TVGL-TS and TGSm-St-SV .In particular, TGSm-St-GL and GSm-St-GL have comparable performance for the Fscore.This is not surprising, since the value of time slot T is relatively small.On the other hand, TVGL-TS and TGSm-St-SV get worse performance than TGSm-St-GL .The result once again clearly reflects that the explicit consideration of hidden variables when inferring the graph structure leads to better performance.

Conclusion
In this paper, we introduced an optimization framework aimed at addressing the issue of time-varying graph learning with hidden nodes.The framework relied on the assumption that the observed signals are both smooth and stationary on learned graphs and identified graph topologies by leveraging the similarity in time-varying graphs.Specially, the key was to leverage the block structure of matrix to handle the presence of hidden nodes, and an optimization framework based on the graph topologies and the graph signals constraints is proposed.Moreover, in order to capture the characteristics of the learned graphs precisely, we augmented the objective function with a column-sparsity constrain and considered the connection of the similarity of different time slots graphs on column-sparsity.The experimental results from both simulated and real-world data verified the effectiveness and superiority of our method.Our future work includes considering the inference problem of time-varying graphs under different evolutionary modes in the presence of hidden nodes.

Fig. 1 Fig. 2
Fig. 1 Numerical validation of the proposed algorithm.a Mean Error of the recovered graphs for several algorithms as the number of hidden nodes increases.b Fscore of TV-ER graphs as the number of hidden nodes increases for different algorithms

Fig. 3
Fig. 3 Graphical representation of the ground-truth graph and the estimated graph at t=1 time slot with N = 20 and H = 1. a represents the ground-truth graph and the corresponding covariance matrix in (e).The ground-truth matrices L(1) O and P(1) are represented in the second column [cf.panels (b) and (f)].Analogously, the estimates L(1) O and P(1) generated by (13) are represented in panels (c) and (g), those generated by (12) in panels (d) and (h)

Fig. 4
Fig.4 The visualization of the matrices P(t) in time-varying graph learned from the synthetic data based on TGSm-St-GL method

Table 1
Comparison between proposed method TGSm-St-GL and alternative

Table 2
The performance achieved by the schemes TGSm-St-GL, GSm-St-GL, TVGL-TS and TVGL-SV while learning time-varying graphs