Autoregressive graph Volterra models and applications

Graph-based learning and estimation are fundamental problems in various applications involving power, social, and brain networks, to name a few. While learning pair-wise interactions in network data is a well-studied problem, discovering higher-order interactions among subsets of nodes is still not yet fully explored. To this end, encompassing and leveraging (non)linear structural equation models as well as vector autoregressions, this paper proposes autoregressive graph Volterra models (AGVMs) that can capture not only the connectivity between nodes but also higher-order interactions presented in the networked data. The proposed overarching model inherits the identifiability and expressibility of the Volterra series. Furthermore, two tailored algorithms based on the proposed AGVM are put forth for topology identification and link prediction in distribution grids and social networks, respectively. Real-data experiments on different real-world collaboration networks highlight the impact of higher-order interactions in our approach, yielding discernible differences relative to existing methods.


Introduction
Full awareness of networks and networked interactions is required for understanding the behavior of complex systems. These systems are typically modeled as graphs in many applications such as financial markets, social networks, power systems, and transportation systems [1][2][3], to name a few. Graph structure identification (prediction) is to identify (predict) if an edge exists (will exist) between a pair of nodes of a graph, given a set of network observations in the form of node attributes at different time instances. Applications include studying the growth of social networks [1] and their dynamics in social sciences, predicting what are the most likely links between users in recommender systems, unveiling pair-wise interactions between elements of different ecological niches or predicting interactions that were not studied due to time or cost restrictions in biology [4].
Albeit pair-wise interactions have the ability to capture the dynamics of the underlying graph, a lot of the interplay among networked data occurs beyond just two nodes [5]. For instance, human interaction over social media takes place with a team rather than two individuals. Furthermore, molecules tend to show more interactions among different groups. Moreover, in smart grids, the dependency among power variables such as voltage and current occurs in a region instead of single pairs. Early approaches to capture higher-order relations in the underlying network have mainly relied on set systems, hypergraphs, and simplicial complexes [6][7][8][9][10], to name a few. Furthermore, to address the existence of nonlinear connectivity, topology identification approaches leveraging partial correlations as well as kernels have recently been developed [11,12]. A link prediction approach for evaluating higher-order data models of complex systems is proposed in [13]. While offering mathematical frameworks to study higher-order relations, these works either directly leverage the network structure, e.g., connections in a hypergraph, or make use of concepts inherited from physical phenomena, e.g., the analysis based on simplicial complexes, using cohomology [14], which might not exist for all kinds of datasets.
Aiming to modeling dynamical processes over a network, different attempts have been made. Specially, data-driven neural network solutions have attracted growing attention recently to learn the nonlinear connectivity [15][16][17]. Furthermore, rooted in structural equation models [18] (SEMs), and in particular combinatorial vector autoregressive models [19], several efforts leveraging either kernels or partial correlations [11,12] have been devoted to capturing the dynamics; see [20][21][22], and references therein. Despite that these approaches manage to capture complex dynamics existing in networked data, they lack interpretability beyond pair-wise interactions. Another issue of the mentioned models is their poor scalability; in other words, the complexity grows exponentially along with the model order.
Volterra series and kernels have emerged as promising tools for data analysis in different applications, e.g., brain networks [23], gene data [24] and communications [25]. Leveraging the sparsity of the Volterra kernels as well as a parsimonious model description, the computational complexity can be reduced [24], especially when using an appropriate basis expansion model of the kernels. Moreover, one can retrieve the original Volterra kernels from the considered basis expansion without losing model expressibility [26]. Although the Volterra series is powerful in modeling nonlineartemporal interactions, using it to capture the higher-order relations in the networked data is not well-explored. Furthermore, in Volterra series models, the autoregressive property is not fully considered as in SEM when interpreting the dynamics in networked data.
Building upon SEMs and Volterra series models, this work advocates an autoregressive graph Volterra model (AGVM) to capture higher-order interactions present in networked data. Different identifiability conditions for the proposed model are derived including the identifiability of the network connection based on exogenous data, sparsity (relieving sampling complexity) and the restricted isometry property in the bipolar case. The proposed model uses graph Volterra kernels to identify interactions between nodes or groups of nodes, providing a principled way to tackle higher-order interactions in networks. Furthermore, to estimate the graph Volterra kernels, two tailored AGVM algorithms for topology identification in power systems and link prediction in social networks are introduced. The proposed approaches differ from existing higher-order interaction methods [13,27], which solely focus on extending metrics commonly used in informal scoring for classical link prediction and identification.

Preliminaries
Consider a graph G = (V, E) , where V is the vertex (node) set with cardinality |V| = N , and E is the edge set with cardinality |E| = E , respectively. A time-series of graph signals {x(t) ∈ R N } T t=1 is collected at the nodes V . In addition, external (exogenous) observables {ζ (t) ∈ R N } T t=1 are available such as features of the nodes, inputs from different networks, and network-level snapshots or layers [28].
The classical structural equation model (SEM) [18] considering the signal x(t) over the graph and the exogenous variables ζ (t) can be described as follows where Ŵ ∈ R N ×N is a diagonal matrix representing the mapping of the exogenous input on the node variables x(t) , and A ∈ R N ×N represents the inter-relations among those variables. Let x i (t) and ζ i (t) denote the ith entry of x(t) and ζ (t) , respectively. The signal at the ith node x i (t) can then be obtained by a weighted combination of the signal of all other nodes and the corresponding exogenous variables as where a i,j and γ i,j are the (i, j)th entry of A and Ŵ , respectively.
The SEM in (2) is able to express the relation between different node variables through the nonzero entries a i,j of A , where A is a hollow matrix and shares the support with the adjacency matrix of the graph. However, it only accounts for the first-order dependencies (i.e., pair-wise relations) in a linear fashion. Several efforts have focused on expanding the expressive power of linear SEMs via nonlinear kernels of nodal variables; see, e.g., [11] and references therein. Although meaningful in many applications, they neglect the so-called higher-order interactions that are present in networked data through higherorder graph structures [5], such as subgraphs and k-cliques, which are subsets of vertices of an undirected graph where every two distinct vertices in the subset are adjacent.
In the following section, we introduce a Volterra model to capture such higher-order interactions and their descriptors. (1)

Higher-order interactions in graphs
Before modeling the higher-order interactions in graphs, let us give a description of the ith node signal, x i (t) , in terms of a set of subsets of nodes, S i . To model first-order interactions, these subsets of nodes are simply single nodes and the set S i is nothing more than the set of neighbors of the ith node, i.e., the nodes j for which a i,j is nonzero in a SEM. Modeling higher-order interactions though requires the subsets to consist of more than one node and hence S i will yield a set of subsets of nodes. Mathematically, the set S (P) i which contains the subsets for defining interactions up to order P is defined as follows where p denotes the order of the set, L p the number of subsets of order p that exist, and S can be defined as the nodes that complete a (p + 1)-clique when the ith node is added. In fact, this subset assignation can be done for any other graph motif [c.f [5]] that seems adequate for the data under analysis. We can now approximate the nonlinear relationship in (4) by a Volterra expansion as follows i is a constant term, and H (p) i [x(t)] denotes the pth order Volterra module given by with h (l,p) i the lth expansion coefficient of order p for the ith variable, g a permutationinvariant nonlinear function describing the type of interaction among the variables, and q means the nodes that complete a (q + 1)-clique when the ith node is added. As the set S (P) i is generally unknown, meaning the interactions at all orders are unknown, the module (6) can be equivalently rewritten using the set of all index combinations of size p, that is where the Volterra kernel h (p) i (k 1 , . . . , k p ) denotes a (p + 1)-clique for the index combination k 1 , . . . , k p . The nonzero coefficients of (7) can be uniquely mapped to the coefficients of (6). A fundamental result of Volterra expansion is that any continuous nonlinear system can be uniformly approximated (i.e., in the L ∞ -norm) to arbitrary accuracy by a Volterra series operator of sufficient but finite order if the input signals form a compact subset of the input function space [29,30].
Notice that in the case of the absence of exogenous variables, the Volterra expansion (5) for the ith signal can be directly related to the SEM expression (2) by considering Thus, a SEM can be seen as a special case of the Volterra expansion where the Volterra kernels are constrained and the inputs are assumed to be the signals on the graph. Now, we are ready to postulate our AGVM that considers higher-order interactions as follows The proposed expansion (8) captures both the autoregressive nature of SEMs and the identifiability and expressibility of Volterra series models. This aspect distinguishes the model from existing nonlinear extensions of SEMs that only consider nonlinear functions of pair-wise interactions. Therefore, higher-order structures in the graph are not seen as fundamental atoms to establish the behavior of the node signal. On the other hand, the expansion (8) allows identifying the existence of higher-order interactions such as triads or p-cliques by observing its nonzero coefficients.
Remark 1 For simplicity, the present work only focuses on interactions up to the second order and uses a product for g. A generalization to higher-order interactions and other permutation-invariant functions is straightforward. For other nonlinear functions, a more careful analysis should be carried out.
By stacking the signal on node i over time steps (similarly stacking the modeling errors through time in ǫ i ), stacking the signals on all nodes at time step t in x(t) , i.e., and restricting ourselves to a second-order model and a product for g, we can rewrite (8) in a matrix-vector form as Here, we have made the following definitions: where vech(·) is the half-vectorization operation that retrieves the upper-triangular part of its matrix argument, and M := N (N + 1)/2. The second-order expansion of x i can be further expressed as where the unknown parameter θ i ∈ R 1+N +M contains the equivalent graph Volterra kernels for the ith variable and M ∈ R (1+N +M)×T is the (known) system matrix. For all involved node signals i ∈ [N ] , we finally obtain where ∈ R (1+N +M)×N collects all the unknown parameters of the system, and E is the corresponding modeling error matrix. For interpretability of (11), one can also rewrite it as where H (i) is the ith-order graph Volterra kernel matrix whose entries are in lexicographic order, i.e., as defined through X (i) . In particular, H (0) = h (0) ∈ R N is a column vector with all constant terms stacked. Here, we present the general form of the AGVM model also accounting for the exogenous variables Y . One of the challenges to find the unknown parameters is its large dimensionality. Although symmetrized Volterra kernels can be uniquely identified [31], the order of the number of unknown parameters in (11) is O(N 3 ) which leads to high computational costs and poor sampling efficiency. Fortunately, as it is shown ahead, judicious modeling of the graph Volterra kernels leads to efficient higher-order interaction identification. But before presenting methods for estimating the graph Volterra kernels, we need to provide identifiability guarantees for the proposed model.

Identifiability of AGVM
This section focuses on the conditions that the input/output data should exhibit in order to uniquely identify the second-order AGVM model (12). Although asymptotic results have been obtained for sparse regression, i.e., the Lasso estimator, here we are more interested in the finite-sample regime. Therefore, borrowing tools from the compressing sensing literature and linear algebra, we are able to provide recovery guarantees in both deterministic and probabilistic settings.
Without loss of generality, we present our following results considering that the zeroth-order term H (0) is projected out and the noise term E is not present or has been removed.
Our first result asserts identifiability of the network connections, represented through the graph Volterra kernels, highlighting the role of the exogenous data Y . Notice that this result is a generalization of the result in [32] obtained for resolving direction ambiguities in structural equation models applied for directed graphs. (1) , X (2) } and Y abide to the second-order AGVM model (2) ) ⊤ ] ⊤ is full row rank, then H (1) , H (2) and Ŵ are uniquely expressible in terms of X and Y as

Theorem 1 Suppose that data {X
This result exhibits the importance of the exogenous data (perturbation), Y , to uniquely identify the AGVM model. This shows, as in the classical SEM, that given a sufficiently rich perturbation, the directionality, as well as the higher-order interactions (triplets), can be uniquely determined from the measured data.
Although the result of Theorem 1 establishes the identifiability of the AGVM model, it requires a full row rank data matrix X , which in many cases might not be possible, i.e., the number of samples must be at least O(N 2 ) . Thus, in order to improve the sampling complexity for the problem, prior information is required to constrain the model. A natural assumption, arising in many networked-data applications, is the sparse interaction among the nodes. That is, the number of connections (edges) among nodes are much smaller than the size of the network, and therefore, the number of triads in which they participate are restricted. Before proceeding, we need the following sparsity assumptions.
Assumptions A.1 and A.2 on the graph Volterra kernels can be seen as restrictions on the number of edges and triangles existing in the graph. By letting K 1 ∈ O(1) and K 2 ∈ O(N ) , these assumptions translate into graph Volterra kernels that represent a sparse graph, i.e., O(N ) edges and O(N 2 ) triangles. In addition, as shown in the following, the sparsity assumptions make the identification of the system possible when only a reduced number of measurements are available. Before stating our second result, a definition is required.

Definition 1
The Kruskal rank of a matrix A ∈ R N ×M , denoted kr(A) , is the maximum number k such that any combination of k columns of A forms a sub-matrix with full column rank.
Although the Kruskal rank is, in general, more restrictive than the traditional rank and harder to verify, when the entries of A are drawn from a continuous distribution, its Kruskal rank equals its rank [33].

Theorem 2 Let {X
This result shows that it is possible to uniquely identify both graph Volterra kernel matrices when the Kruskal condition is met even in the case that the model is selfdriven, i.e., no exogenous inputs.
In the following, we present a result involving the exogenous inputs.  Here, differently from the pure self-driven case, the presence of the exogenous term leads to a different identifiability condition: structural identifiability. That is, given that the condition on the Kruskal rank of the projected data matrix is met, the positions of the nonzero entries of the second-order graph Volterra kernels H (2) can be uniquely identified. When the graph Volterra kernels are related to the (p + 1)-cliques [cf. (7)] in a network, the above result allows for higher-order link prediction. In addition, the support of H (1) can then also be partially estimated from the nonzero entries of H (2) , as by assumption, in this case, their supports share a relation. More specifically, the existence of a triangle between nodes directly implies edges among the elements in the clique. However, as the nonexistence of a clique, e.g., a triangle, does not rule out the existence of an edge, not all edges, i.e., positions of the nonzeros of H (1) , can be identified.
To present an identifiability result using sparse recovery, we employ the following definition.
Definition 2 (Restricted Isometry Property (RIP)) Matrix A ∈ R N ×M possesses the restricted isometry of order s, denoted as δ s ∈ (0, 1), if for all h ∈ R M with �h� 0 ≤ s [34] RIP is a fundamental property for providing identifiability conditions of sparse recovery. It has been shown that given δ 2s < √ 2 − 1 , the constrained version of the Lasso optimization problem yields �h − h * � 2 2 ≤ cǫ 2 for some constant c depending on δ 2s when the linear model y = Ah * + v , �v� 2 ≤ ǫ holds, where h * is the solution of (14) [34,35].
In the literature, in particular works on sparse polynomial regression [35] and Volterra series [36], several guarantees have been established for system matrices spawning from different alphabets and/or different distributions. For instance, in [24] results for Volterra system identification have been derived for signals drawn from {−1, 0, 1} and in [35] signals drawn from U[−1, 1] . However, the bipolar case, e.g., {−1, 1} , has not been considered and its treatment within the self-driven Volterra expansion is still missing. Therefore, in the following, we present a RIP result for the second-order AGVM, whose technical proof is detailed in Appendix.

. , T ] be an input sequence of independent random variables drawn from the alphabet {−1, 1} with equal probability. Assume that the AGVM regression matrix is defined as
where L = N + N (N − 1)/2 , X l := X (1) and X b is X (2) with the quadratic terms removed, i.e., it only contains bilinear terms x i (t)x j (t), i = j . Then, for any δ s ∈ (0, 1) and for any γ ∈ (0, 1) , whenever T ≥ 4C  Notice that the pure quadratic terms in X (2) have been removed. This is due to the fact that for the bipolar signal case, these quadratic terms are constant when the alphabet is {−1, 1} and equivalent to X (1) when the alphabet is {0, 1} . Hence, in both cases its contribution can be omitted without loss of generality. Furthermore, the data matrices are normalized concerning for the number of available measurements, i.e., T. This is done in order to guarantee that the diagonal entries of the Grammian of X ⊤ are unity in expectation.
This theorem asserts that T ∈ O(s 2 log N ) observations suffice to recover an s-sparse vector with graph Volterra kernels. Since it is considered that the number of unknowns per row of H (1) and H (2) is at most O(N ) [cf. [1][2], the bound on the sampling complexity scales as O(N 2 log N ) which agrees with bounds obtained for linear filtering setups [24,37]; however, in this paper, the constant C is relatively small.
Given that under the established conditions, the proposed AGVM model is identifiable and is able to leverage sparsity to relieve its sampling complexity, in the following section, we present task-specific constraints for higher-order link inference and methods for estimating the graph Volterra kernels.

Real data applications
With the identifiability guarantees at hand, this section investigates how various learning tasks can benefit from the proposed AGVM. Specifically, two tailored AGVM algorithms for different applications namely topology identification in power systems and link prediction in social networks are presented.

Topology identification in distribution grids
The vertex set V of a graph in a distribution grid comprises the indices of the nodal buses, while the edge set E collects all the power distribution lines. The distribution grid is supposed to be a radial structure and thus the vertex set V := {0, N } has a root (substation) bus indexed by n = 0 . Every non-root bus n ∈ N = {1, . . . , N } has a unique parent bus π n . Naturally, the number of non-root buses is equal to the number of power lines in a radial network, that is, |N | = |E| = N 2 .
In order to reveal both the edge connections as well as their higher-order interactions in power grids, we need to analyze the dependency of the signals on buses. The signal x n (t) here is the squared voltage magnitude of bus n ∈ N at time t. Based on our previous work [38], the voltage relationship among bus n and its children buses is given by where i ∈ {k : k ∈ N , π k = n} and j ∈ {k : k ∈ N , π k = n, k ≥ i} ; and h (2) n (i, j) are the first-and second-order expansion coefficients relating bus n with the sets {i} and {i, j} , respectively; ǫ n comprises the modeling error and measurement noise on bus n. Collecting the data into {x(t)} T t=1 , and stacking the first-and second-order coefficients h (1) and h (2) n (i, j) into the vectors h (1) n and h (2) n in a lexicographic order. Similar to the derivation from (9) to (12), we can get the voltage on all buses (15) in a compact form as where the t-th columns of X (1) and X (2) are x(t) and x(t) ⊠ x(t) respectively; the n-th rows of H (1) and H (2) are (h (1) n ) ⊤ and (h (2) n ) ⊤ , respectively. To estimate the coefficients of model (16), the following assumptions are needed.
A. 3 There is no self-interaction on a bus; therefore, the matrix H (1) is a hollow matrix, i.e., h (1) n (n) = 0, ∀n ∈ N . A. 4 There is no second-order interactions between two buses, that is to say, the coefficients for the second-order interactions satisfy h (2) n (j, k) = 0 , if n = j , or j = k , or n = k holds. A. 5 The second-order interactions only exist among two buses connected through a central bus. Thus, the graph Volterra coefficients obey h (2) n (j, k) = 0 , if there exists h (1) n (l) = 0, ∀ l ∈ {j, k}.
Assumptions A. 3 and A. 4 both entail linear constraints, and thus can be easily fit in an optimization problem. To cope with the conditional constraint in A. 5, we call for the auxiliary matrix where the first column equals h (1) n . To guarantee that if h (1) n (i) = 0 , then h (2) n (i, j) = 0 , we enforce row sparsity in n by adding using ℓ 2,1 -regularization on H ⊤ n , ∀n ∈ N . Based on the sparsity result from Sect. 4, we can estimate the expansion coefficients by the following sparsity-aware ℓ 2,1 -regularized least-squares where The set X h is convex and signifies the constraints characterized by Assumptions A. 3 and A. 4. The convex optimization problem (18) can be efficiently solved (and distributed) by off-the-shelf convex programming toolboxes. (2) , and θ n = h (1) To identify the underlying topology, the Southern California Edison (SCE) 47-bus distribution grid [3,39] using real consumption and solar generation data from the Smart * project [40] was employed. Feeding this data to AC power flow equations [41,42], we can obtain the voltage squared magnitude measurements {x(t)} T t=1 across T = 240 time slots. The voltage magnitudes of the substation bus satisfies v 0 (t) = 1, ∀t ∈ {1, . . . , T } . The radial grid comprises 41 buses where the interactions need to be inferred, after ignoring the root bus and the buses connected to their parent buses with zero-impedance lines. With {x(t)} T t=1 , the graph Volterra kernels were estimated by solving (18) and constructing H (1) and H (2) as in (16). While removing non-significant entries by a pointwise thresholding operation, the grid topology was inferred from the support of H (1) and the higher-order interactions were retrieved from H (2) .
To assess the performance of the proposed AGVM approach (18), we have simulated three edge connectivity recovery methods, namely: i) multi-kernel based partial correlations (MKPC)-scheme [22]; ii) linear PC-scheme [43]; and iii) concentration matrixbased scheme [44]. The results were measured by the empirical receiver operating characteristic (ROC) curves and the area under the curve (AUC) values. Figure 1 depicts the ROC curves of all methods, while the AUC for AGVM, MKPC, linear PC, and concentration matrix are 0.9483, 0.9008, 0.8836, and 0.8052, respectively. The results showcase the proposed scheme outperforms all competing alternatives by exploiting the nonlinear interactions.
The second experiment entailed the IEEE 123-bus feeder to examine the scalability and performance of the algorithm in topology identification. Voltage squared magnitude measurements {x(t)} T t=1 across T = 400 time slots were used. The ROC curves of the AGVM, MKPC, linear PC, and concentration matrix method are shown in Fig. 2. The AUC values for the AGVM, MKPC, linear PC, and concentration matrix method are

Link prediction in social networks
Another important domain inspiring higher-order interactions is link prediction in social and other biological networks. The goal of link prediction is to find the most likely subsets of vertices that will interact in the near future based on available observations of the activation of different nodes, e.g., song releases, email exchanges, and paper publications. Specifically, given a set of binary measurements X ∈ {0, 1} N ×T at time slots t = {1, . . . , T } , one needs to predict what is the most likely set of nodes to be activated together at any t ′ > T.
In this subsection, we are considering the problem of predicting the closure of triangles, i.e., triplets of nodes that activate at the same time. Therefore, AGVMs can be restricted to order P = 2 . Further, using the binary input data assumption, we can regard the interaction between variables as its joint activation and assume the function g in (7) is the product operation. As a result, a direct instantiation of an AGVM produces a model that constructs real-valued signals. Instead of directly modeling x i (t) , we borrow an idea from binary regression methods and use a latent variable z i (t) to model the probability P(x i (t) = 1|z i (t)) as P(x i (t) = 1|z i (t)) = σ (z i (t)) , where σ (·) represents the sigmoid function. The latent variable z i (t) is then modeled as Gathering the latent variables for nodes through time slots, the AGVM (12) for the link prediction task becomes (19) x(t)).  (2) ] ; and M = [1 (X (1) ) ⊤ (X (2) ) ⊤ ] ⊤ . Different from traditional logistic regression, the binary labels in (20) are the node variables themselves.
The goal of closure prediction is to find the most likely sets of nodes, which form an open structure that will become close. Here, an open structure refers to a set of nodes, A , that have interacted with each other, but have not appeared simultaneously on a single simplex set, whose elements are the indexes of the nonzero elements of x(t) [45]. Therefore, based on the analysis in our previous work [45], we have the following conclusions. Observing the support of off-diagonal entries of W = X (1) (X (1) ) ⊤ , we can obtain an initial network connectivity. From this connectivity, and enforcing h (2) i (i, i) = 0, ∀i ∈ V , the set of open triangles T O , closed triangles T , and the candidates for the nonzero graph Volterra coefficients S (2) := {S (2) i } N i=1 (cf. (3)) can be obtained. Upon S (2) , it holds vec(�) = Bθ , where θ captures the nonzero kernels, and B is an expansive binary matrix that relates the nonzero entries of the graph Volterra kernels in vec( ) . These nonzero Volterra kernels are defined by the support obtained from W . Noticing that we know the nonzero positions of vec( ) , and the dimensions of vec( ) and θ , we can build this B matrix based on the initial network connectivity. To estimate the parameter θ , we propose a proximal gradient ascent algorithm with sparsity regularization, which is summarized in Alg. 1. Notice that get_mtx_row takes the row of the input which is related to the latent variable z i (t) ; soft_thr(·, η ) entails soft threshold with respect to η . We assume that open triangles with large coefficients, which means a high level of interaction, are the most likely triangles to become closed. After obtaining θ , we sort the entries related to T O by their absolute value, and the top K entries are then the K most likely open triangles to become closed. To examine the effectiveness of Alg. 1, the third experiment entailed the "Enron_ email" [46] and "primary_school_contact" [47] datasets as well as several alternatives in [13] in terms of open triangle closure prediction. In the "Enron_email" dataset, the nodes denote the email addresses of different Enron employees, while in the "primary_school_contact" dataset, nodes are proximity-based contacts recorded by wearable sensors in a primary school. The training set contains the first 10% and 1% of timestamped events in the "Enron_email" and "primary_school_contact" datasets, respectively, while the testing set includes the remaining data. The proposed algorithm employs = 10 −3 , η = 10 −4 and k max = 500 for both experiments. The AUC metric on the first 100 nodes of the datasets for all methods is shown in Fig. 3. The curves showcase the effectiveness of the proposed model along with the logistic regression of Alg. 1 compared with recently proposed methods based on generalizing the link prediction scores for the task of triangle closure prediction.

Conclusions
This paper proposes a principled manner to identify and predict the higher-order interactions in networked data. Borrowing ideas from SEMs and Volterra models, a node signal in the network is modeled as a combination of its neighbor signals and a nonlinear combination of the signals in the groups (higher-order links) it belongs to. Some identifiability guarantees of the proposed second-order AGVM are then provided under  ∈ (0, 1) , then X possesses RIP δ s ≤ δ . Therefore, we can upper bound the probability of X not satisfying RIP of value δ , Pr(δ s > δ) , as As G is symmetric, we can use the union bound only for its unique entries to upper bound Pr(δ s > δ) as To show the result of the theorem, we proceed next to bound the probabilities above. The analysis of these probabilities is similar to the one in [24]. However, here we obtain results for a different distribution and for linear and bilinear components. To simplify the notation, we introduce the following partition for the Grammian matrix, i.e., where G ll :=X l (X l ) ⊤ , G lb :=X l (X b ) ⊤ and G bb : Recalling that the raw moments for the inputs are given by we obtain the following relations and E{[G lb ] ij } = 0 ∀ i, j. By a quick inspection, we notice that the terms of the first part of Pr(δ s > δ) are identically zero, hence δ d = 0.
To bound the required probabilities, we make use of the following Hoeffding's inequality.
Lemma 1 (Hoeffding's Inequality) Given t > 0 and independent random variables bounded as a i ≤ x i ≤ b i almost surely, the sum s N := N i=1 x i satisfies Pr(δ s > δ) : Pr δ s > δ ≤ exp − γ δ 2 T Cs 2