 Research
 Open Access
 Published:
Autoregressive graph Volterra models and applications
EURASIP Journal on Advances in Signal Processing volume 2023, Article number: 4 (2023)
Abstract
Graphbased learning and estimation are fundamental problems in various applications involving power, social, and brain networks, to name a few. While learning pairwise interactions in network data is a wellstudied problem, discovering higherorder interactions among subsets of nodes is still not yet fully explored. To this end, encompassing and leveraging (non)linear structural equation models as well as vector autoregressions, this paper proposes autoregressive graph Volterra models (AGVMs) that can capture not only the connectivity between nodes but also higherorder interactions presented in the networked data. The proposed overarching model inherits the identifiability and expressibility of the Volterra series. Furthermore, two tailored algorithms based on the proposed AGVM are put forth for topology identification and link prediction in distribution grids and social networks, respectively. Realdata experiments on different realworld collaboration networks highlight the impact of higherorder interactions in our approach, yielding discernible differences relative to existing methods.
1 Introduction
Full awareness of networks and networked interactions is required for understanding the behavior of complex systems. These systems are typically modeled as graphs in many applications such as financial markets, social networks, power systems, and transportation systems [1,2,3], to name a few. Graph structure identification (prediction) is to identify (predict) if an edge exists (will exist) between a pair of nodes of a graph, given a set of network observations in the form of node attributes at different time instances. Applications include studying the growth of social networks [1] and their dynamics in social sciences, predicting what are the most likely links between users in recommender systems, unveiling pairwise interactions between elements of different ecological niches or predicting interactions that were not studied due to time or cost restrictions in biology [4].
Albeit pairwise interactions have the ability to capture the dynamics of the underlying graph, a lot of the interplay among networked data occurs beyond just two nodes [5]. For instance, human interaction over social media takes place with a team rather than two individuals. Furthermore, molecules tend to show more interactions among different groups. Moreover, in smart grids, the dependency among power variables such as voltage and current occurs in a region instead of single pairs. Early approaches to capture higherorder relations in the underlying network have mainly relied on set systems, hypergraphs, and simplicial complexes [6,7,8,9,10], to name a few. Furthermore, to address the existence of nonlinear connectivity, topology identification approaches leveraging partial correlations as well as kernels have recently been developed [11, 12]. A link prediction approach for evaluating higherorder data models of complex systems is proposed in [13]. While offering mathematical frameworks to study higherorder relations, these works either directly leverage the network structure, e.g., connections in a hypergraph, or make use of concepts inherited from physical phenomena, e.g., the analysis based on simplicial complexes, using cohomology [14], which might not exist for all kinds of datasets.
Aiming to modeling dynamical processes over a network, different attempts have been made. Specially, datadriven neural network solutions have attracted growing attention recently to learn the nonlinear connectivity [15,16,17]. Furthermore, rooted in structural equation models [18] (SEMs), and in particular combinatorial vector autoregressive models [19], several efforts leveraging either kernels or partial correlations [11, 12] have been devoted to capturing the dynamics; see [20,21,22], and references therein. Despite that these approaches manage to capture complex dynamics existing in networked data, they lack interpretability beyond pairwise interactions. Another issue of the mentioned models is their poor scalability; in other words, the complexity grows exponentially along with the model order.
Volterra series and kernels have emerged as promising tools for data analysis in different applications, e.g., brain networks [23], gene data [24] and communications [25]. Leveraging the sparsity of the Volterra kernels as well as a parsimonious model description, the computational complexity can be reduced [24], especially when using an appropriate basis expansion model of the kernels. Moreover, one can retrieve the original Volterra kernels from the considered basis expansion without losing model expressibility [26]. Although the Volterra series is powerful in modeling nonlineartemporal interactions, using it to capture the higherorder relations in the networked data is not wellexplored. Furthermore, in Volterra series models, the autoregressive property is not fully considered as in SEM when interpreting the dynamics in networked data.
Building upon SEMs and Volterra series models, this work advocates an autoregressive graph Volterra model (AGVM) to capture higherorder interactions present in networked data. Different identifiability conditions for the proposed model are derived including the identifiability of the network connection based on exogenous data, sparsity (relieving sampling complexity) and the restricted isometry property in the bipolar case. The proposed model uses graph Volterra kernels to identify interactions between nodes or groups of nodes, providing a principled way to tackle higherorder interactions in networks. Furthermore, to estimate the graph Volterra kernels, two tailored AGVM algorithms for topology identification in power systems and link prediction in social networks are introduced. The proposed approaches differ from existing higherorder interaction methods [13, 27], which solely focus on extending metrics commonly used in informal scoring for classical link prediction and identification.
Paper outline. The rest of the paper is organized as follows: Sect. 2 briefly reviews the mathematical model that is used throughout the paper. Section 3 introduces the higherorder interactions model based on Volterra series. Section 4 provides identifiability guarantees for the proposed model. Section 5 presents two tailored algorithms for the application of topology identification in power systems as well as link prediction in social networks. Concluding remarks are drawn in Sect. 6.
Notation. Lower (upper) case boldface letters denote column vectors (matrices), and normal letters represent scalars. Calligraphic letters are reserved for sets with the exception of matrices \(\mathcal {X}\) and \(\mathcal {H}\). Symbol \(\top\) stands for transposition. The (i, j)th entry of matrix \({\varvec{X}}\) is denoted as \(x_{i,j}\) or \([{\varvec{X}}]_{i,j}\). The definition of the operator \(\boxtimes\) is the reduced Kronecker product on two equalsize vectors, that is, \({\varvec{x}}\boxtimes {\varvec{y}}:= [x_1y_1\; x_1y_2\;\cdots \;x_i y_j\;\cdots \; x_{N1}y_N\;x_N y_N]^\top\), where \(i \le j\). The operation \(*\) denotes the KhatriRao product.
2 Preliminaries
Consider a graph \(\mathcal {G}= (\mathcal {V},\mathcal {E})\), where \(\mathcal {V}\) is the vertex (node) set with cardinality \(\mathcal {V} = N\), and \(\mathcal {E}\) is the edge set with cardinality \(\mathcal {E} = E\), respectively. A timeseries of graph signals \(\{{\varvec{x}}(t)\in \mathbb {R}^N\}_{t=1}^{T}\) is collected at the nodes \(\mathcal {V}\). In addition, external (exogenous) observables \(\{{\varvec{\zeta }}(t)\in \mathbb {R}^{N}\}_{t=1}^{T}\) are available such as features of the nodes, inputs from different networks, and networklevel snapshots or layers [28].
The classical structural equation model (SEM) [18] considering the signal \({\varvec{x}}(t)\) over the graph and the exogenous variables \({\varvec{\zeta }}(t)\) can be described as follows
where \({\varvec{\Gamma }}\in \mathbb {R}^{N\times N}\) is a diagonal matrix representing the mapping of the exogenous input on the node variables \({\varvec{x}}(t)\), and \({\varvec{A}}\in \mathbb {R}^{N\times N}\) represents the interrelations among those variables. Let \(x_i(t)\) and \(\zeta _i(t)\) denote the ith entry of \({\varvec{x}}(t)\) and \({\varvec{\zeta }}(t)\), respectively. The signal at the ith node \(x_i(t)\) can then be obtained by a weighted combination of the signal of all other nodes and the corresponding exogenous variables as
where \(a_{i,j}\) and \(\gamma _{i,j}\) are the (i, j)th entry of \({\varvec{A}}\) and \({\varvec{\Gamma }}\), respectively.
The SEM in (2) is able to express the relation between different node variables through the nonzero entries \(a_{i,j}\) of \({\varvec{A}}\), where \({\varvec{A}}\) is a hollow matrix and shares the support with the adjacency matrix of the graph. However, it only accounts for the firstorder dependencies (i.e., pairwise relations) in a linear fashion. Several efforts have focused on expanding the expressive power of linear SEMs via nonlinear kernels of nodal variables; see, e.g., [11] and references therein. Although meaningful in many applications, they neglect the socalled higherorder interactions that are present in networked data through higherorder graph structures [5], such as subgraphs and kcliques, which are subsets of vertices of an undirected graph where every two distinct vertices in the subset are adjacent.
In the following section, we introduce a Volterra model to capture such higherorder interactions and their descriptors.
3 Higherorder interactions in graphs
Before modeling the higherorder interactions in graphs, let us give a description of the ith node signal, \(x_i(t)\), in terms of a set of subsets of nodes, \(\mathcal {S}_i\). To model firstorder interactions, these subsets of nodes are simply single nodes and the set \(\mathcal {S}_i\) is nothing more than the set of neighbors of the ith node, i.e., the nodes j for which \(a_{i,j}\) is nonzero in a SEM. Modeling higherorder interactions though requires the subsets to consist of more than one node and hence \(\mathcal {S}_i\) will yield a set of subsets of nodes.
Mathematically, the set \(\mathcal {S}_i^{(P)}\) which contains the subsets for defining interactions up to order P is defined as follows
where p denotes the order of the set, \(L_p\) the number of subsets of order p that exist, and \(\mathcal {S}_i^{(l,p)}\subset \mathcal {V}\) the lth set of p nodes related to the ith node in the graph \(\mathcal {G}\). For simplicity, the exogenous variable has been omitted. With (3), we can put forth the following signal model
where f maps the signals \({\varvec{x}}(t)\) from \(\mathcal {S}_i^{(P)}\) to \(x_i(t)\). For example, considering \(P=1\) and \(\mathcal {S}_{i}^{(l, p)}\) only containing the lth neighbor of the ith node, i.e., \(\mathcal {S}_i^{(1)} = \cup _{l=1}^{L_1}\mathcal {S}_{i}^{(l,1)}\), and assuming a linear map for f, we retrieve the SEM without considering exogenous variables [cf. (2)].
The subsets in \(\mathcal {S}_i^{(P)}\) capture the gregarious behavior of the nodes. For example, the subsets \(\{\mathcal {S}_{i}^{(l, 2)}\}_{l=1}^{L_2}\) can be viewed as the pairs of nodes that form a triad with the ith node. Similarly, the subsets \(\{\mathcal {S}_{i}^{(l, p)}\}_{l=1}^{L_p}\) can be defined as the nodes that complete a \((p+1)\)clique when the ith node is added. In fact, this subset assignation can be done for any other graph motif [c.f [5]] that seems adequate for the data under analysis. We can now approximate the nonlinear relationship in (4) by a Volterra expansion as follows
where \(h_i^{(0)}\) is a constant term, and \(H^{(p)}_i[{\varvec{x}}(t)]\) denotes the pth order Volterra module given by
with \(h_{i}^{(l,p)}\) the lth expansion coefficient of order p for the ith variable, g a permutationinvariant nonlinear function describing the type of interaction among the variables, and q means the nodes that complete a \((q+1)\)clique when the ith node is added. As the set \(\mathcal {S}_{i}^{(P)}\) is generally unknown, meaning the interactions at all orders are unknown, the module (6) can be equivalently rewritten using the set of all index combinations of size p, that is
where the Volterra kernel \(h_i^{(p)}(k_1,\ldots ,k_p)\) denotes a \((p+1)\)clique for the index combination \(\left\{ k_{1}, \ldots , k_{p}\right\}\). The nonzero coefficients of (7) can be uniquely mapped to the coefficients of (6). A fundamental result of Volterra expansion is that any continuous nonlinear system can be uniformly approximated (i.e., in the \(L^\infty\)norm) to arbitrary accuracy by a Volterra series operator of sufficient but finite order if the input signals form a compact subset of the input function space [29, 30].
Notice that in the case of the absence of exogenous variables, the Volterra expansion (5) for the ith signal can be directly related to the SEM expression (2) by considering \(h_i^{(0)}\), \(h_i^{(1)}(j) = a_{i,j}\)^{Footnote 1} and \(h^{(p)}_i(k_1,\ldots ,k_p) = 0, \;\forall \; p > 1\). Thus, a SEM can be seen as a special case of the Volterra expansion where the Volterra kernels are constrained and the inputs are assumed to be the signals on the graph.
Now, we are ready to postulate our AGVM that considers higherorder interactions as follows
The proposed expansion (8) captures both the autoregressive nature of SEMs and the identifiability and expressibility of Volterra series models. This aspect distinguishes the model from existing nonlinear extensions of SEMs that only consider nonlinear functions of pairwise interactions. Therefore, higherorder structures in the graph are not seen as fundamental atoms to establish the behavior of the node signal. On the other hand, the expansion (8) allows identifying the existence of higherorder interactions such as triads or pcliques by observing its nonzero coefficients.
Remark 1
For simplicity, the present work only focuses on interactions up to the second order and uses a product for g. A generalization to higherorder interactions and other permutationinvariant functions is straightforward. For other nonlinear functions, a more careful analysis should be carried out.
By stacking the signal on node i over time steps \(t=(1, \ldots , T)\) in \({\varvec{x}}_i\), i.e., \({\varvec{x}}_i := [x_i(1),x_i(2),\ldots ,x_i(T)]^\top\) (similarly stacking the modeling errors through time in \({\varvec{\epsilon }}_i\)), stacking the signals on all nodes at time step t in \({\varvec{x}}(t)\), i.e., \({\varvec{x}}(t):=~ [x_1(t),x_2(t),\ldots ,x_N(t)]^\top \in \mathbb {R}^{N}\), and restricting ourselves to a secondorder model and a product for g, we can rewrite (8) in a matrixvector form as
Here, we have made the following definitions: \({\varvec{X}}^{(1)}:= [{\varvec{x}}(1),\ldots , {\varvec{x}}(T)]\in \mathbb {R}^{N\times T}\), \({\varvec{X}}^{(2)}:=[{\varvec{x}}(1)\boxtimes {\varvec{x}}(1),\ldots ,{\varvec{x}}(T)\boxtimes {\varvec{x}}(T)] \in {{\textbf{R}}}^{M \times T}\), \([{\varvec{h}}_i^{(1)}]_j = h_i^{(1)}(j)\), and
where \(\textrm{vech}(\cdot )\) is the halfvectorization operation that retrieves the uppertriangular part of its matrix argument, and \(M:=N(N+1)/2\).
The secondorder expansion of \({\varvec{x}}_i\) can be further expressed as
where the unknown parameter \({\varvec{\theta }}_i\in \mathbb {R}^{1+N+M}\) contains the equivalent graph Volterra kernels for the ith variable and \({\varvec{M}} \in \mathbb {R}^{(1+N+M)\times T}\) is the (known) system matrix.
For all involved node signals \(i \in [N]\), we finally obtain
where \({\varvec{\Theta }}\in \mathbb {R}^{(1+N+M) \times N}\) collects all the unknown parameters of the system, and \({\varvec{E}}\) is the corresponding modeling error matrix. For interpretability of (11), one can also rewrite it as
where \({\varvec{H}}^{(i)}\) is the ithorder graph Volterra kernel matrix whose entries are in lexicographic order, i.e., as defined through \({\varvec{X}}^{(i)}\). In particular, \({\varvec{H}}^{(0)} = {\varvec{h}}^{(0)}\in \mathbb {R}^{N}\) is a column vector with all constant terms stacked. Here, we present the general form of the AGVM model also accounting for the exogenous variables \({\varvec{Y}}\).
One of the challenges to find the unknown parameters \({\varvec{\Theta }}\) is its large dimensionality. Although symmetrized Volterra kernels can be uniquely identified [31], the order of the number of unknown parameters in (11) is \({{\mathcal {O}}}(N^3)\) which leads to high computational costs and poor sampling efficiency. Fortunately, as it is shown ahead, judicious modeling of the graph Volterra kernels leads to efficient higherorder interaction identification. But before presenting methods for estimating the graph Volterra kernels, we need to provide identifiability guarantees for the proposed model.
4 Identifiability of AGVM
This section focuses on the conditions that the input/output data should exhibit in order to uniquely identify the secondorder AGVM model (12). Although asymptotic results have been obtained for sparse regression, i.e., the Lasso estimator, here we are more interested in the finitesample regime. Therefore, borrowing tools from the compressing sensing literature and linear algebra, we are able to provide recovery guarantees in both deterministic and probabilistic settings.
Without loss of generality, we present our following results considering that the zerothorder term \({\varvec{H}}^{(0)}\) is projected out and the noise term \({\varvec{E}}\) is not present or has been removed.
Our first result asserts identifiability of the network connections, represented through the graph Volterra kernels, highlighting the role of the exogenous data \({\varvec{Y}}\). Notice that this result is a generalization of the result in [32] obtained for resolving direction ambiguities in structural equation models applied for directed graphs.
Theorem 1
Suppose that data \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) and \({\varvec{Y}}\) abide to the secondorder AGVM model
for a matrix \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and diagonal matrix \({\varvec{\Gamma }}\) with diagonal entries\([{\varvec{\Gamma }}]_{ii} \ne 0\). If \({\mathcal {X}}:= [({\varvec{X}}^{(1)})^\top \;({\varvec{X}}^{(2)})^\top ]^\top\) is full row rank, then \({\varvec{H}}^{(1)}\), \({\varvec{H}}^{(2)}\) and \({\varvec{\Gamma }}\) are uniquely expressible in terms of \({\mathcal {X}}\) and \({\varvec{Y}}\) as
where \({\varvec{Y}}{\mathcal {X}}^{\dagger } = [{\varvec{Q}}_1\in \mathbb {R}^{N\times N}\,{\varvec{Q}}_2\in \mathbb {R}^{N \times M}]\).
Proof
See Appendix. \(\square\)
This result exhibits the importance of the exogenous data (perturbation), \({\varvec{Y}}\), to uniquely identify the AGVM model. This shows, as in the classical SEM, that given a sufficiently rich perturbation, the directionality, as well as the higherorder interactions (triplets), can be uniquely determined from the measured data.
Although the result of Theorem 1 establishes the identifiability of the AGVM model, it requires a full row rank data matrix \(\mathcal {X}\), which in many cases might not be possible, i.e., the number of samples must be at least \({{\mathcal {O}}}(N^2)\). Thus, in order to improve the sampling complexity for the problem, prior information is required to constrain the model. A natural assumption, arising in many networkeddata applications, is the sparse interaction among the nodes. That is, the number of connections (edges) among nodes are much smaller than the size of the network, and therefore, the number of triads in which they participate are restricted. Before proceeding, we need the following sparsity assumptions.

A. 1
Each row of matrix \({\varvec{H}}^{(1)}\) has at most \(K_1\) nonzero entries, i.e., \(\Vert {\varvec{h}}_{i}^{(1)} \Vert _0 \le K_1\; \forall \; i\).

A. 2
Each row of matrix \({\varvec{H}}^{(2)}\) has at most \(K_2\) nonzero entries, i.e., \(\Vert {\varvec{h}}_{i}^{(2)} \Vert _0 \le K_2\; \forall \; i\).
Assumptions A.1 and A.2 on the graph Volterra kernels can be seen as restrictions on the number of edges and triangles existing in the graph. By letting \(K_1 \in {{\mathcal {O}}}(1)\) and \(K_2 \in {{\mathcal {O}}}(N)\), these assumptions translate into graph Volterra kernels that represent a sparse graph, i.e., \({{\mathcal {O}}}(N)\) edges and \({{\mathcal {O}}}(N^2)\) triangles. In addition, as shown in the following, the sparsity assumptions make the identification of the system possible when only a reduced number of measurements are available.
Before stating our second result, a definition is required.
Definition 1
The Kruskal rank of a matrix \({\varvec{A}}\in \mathbb {R}^{N\times M}\), denoted \(\textrm{kr}({\varvec{A}})\), is the maximum number k such that any combination of k columns of \({\varvec{A}}\) forms a submatrix with full column rank.
Although the Kruskal rank is, in general, more restrictive than the traditional rank and harder to verify, when the entries of \({\varvec{A}}\) are drawn from a continuous distribution, its Kruskal rank equals its rank [33].
To begin with, we consider a model without exogenous inputs. That is, a pure selfdriving system.
Theorem 2
Let \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) abide to the secondorder AGVM
for sparse matrices \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and \({\varvec{H}}^{(2)}\) satisfying A.1 and A.2, respectively. If\(\textrm{kr}(\mathcal {X}^\top ) \ge 2(K_1 + K_2)\), where \(\mathcal {X} := [({\varvec{X}}^{(1)})^\top \, ({\varvec{X}}^{(2)})^\top ]^\top\), then \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) can be uniquely identified.
Proof
See Appendix. \(\square\)
This result shows that it is possible to uniquely identify both graph Volterra kernel matrices when the Kruskal condition is met even in the case that the model is selfdriven, i.e., no exogenous inputs.
In the following, we present a result involving the exogenous inputs.
Theorem 3
Let \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) and \({\varvec{Y}}\) abide to the secondorder AGVM
for sparse matrices \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and \({\varvec{H}}^{(2)}\) satisfying A.2; and a diagonal matrix \({\varvec{\Gamma }}\) with diagonal entries \([{\varvec{\Gamma }}]_{ii} \ne 0\). Given a matrix \({\varvec{\Pi }}_1\) such that \({\varvec{X}}^{(1)}{\varvec{\Pi }}_1 = {\varvec{0}}\) and \({\varvec{Y}}{\varvec{\Pi }}_1 \ne {\varvec{0}}\), if \(\textrm{kr}({{\varvec{C}}}[{\varvec{X}}^{(1)}*{\varvec{X}}^{(1)}]{\varvec{\Pi }}_1) \ge 2K_2 + 1\), where \({\varvec{C}}\) is a binary selection matrix picking the appropriate rows of the Kronecker product, then the positions of the nonzero entries of \({\varvec{H}}^{(2)}\) are unique.
Proof
See Appendix. \(\square\)
Here, differently from the pure selfdriven case, the presence of the exogenous term leads to a different identifiability condition: structural identifiability. That is, given that the condition on the Kruskal rank of the projected data matrix is met, the positions of the nonzero entries of the secondorder graph Volterra kernels \({\varvec{H}}^{(2)}\) can be uniquely identified. When the graph Volterra kernels are related to the \((p+1)\)cliques [cf. (7)] in a network, the above result allows for higherorder link prediction. In addition, the support of \({\varvec{H}}^{(1)}\) can then also be partially estimated from the nonzero entries of \({\varvec{H}}^{(2)}\), as by assumption, in this case, their supports share a relation. More specifically, the existence of a triangle between nodes directly implies edges among the elements in the clique. However, as the nonexistence of a clique, e.g., a triangle, does not rule out the existence of an edge, not all edges, i.e., positions of the nonzeros of \({\varvec{H}}^{(1)}\), can be identified.
To present an identifiability result using sparse recovery, we employ the following definition.
Definition 2
(Restricted Isometry Property (RIP)) Matrix \({\varvec{A}} \in \mathbb {R}^{N\times M}\) possesses the restricted isometry of order s, denoted as \(\delta _s\in (0,1),\) if for all \({\varvec{h}}\in \mathbb {R}^{M}\) with \(\Vert {\varvec{h}} \Vert _0\le s\) [34]
RIP is a fundamental property for providing identifiability conditions of sparse recovery. It has been shown that given \(\delta _{2s} < \sqrt{2}1\), the constrained version of the Lasso optimization problem
yields \(\Vert {{\varvec{h}}}  {\varvec{h}}^* \Vert _2^2 \le c \epsilon ^2\) for some constant c depending on \(\delta _{2s}\) when the linear model \({\varvec{y}} = {\varvec{A}}{\varvec{h}}^* + {\varvec{v}}\), \(\Vert {\varvec{v}} \Vert _2 \le \epsilon\) holds, where \({\varvec{h}}^*\) is the solution of (14) [34, 35].
In the literature, in particular works on sparse polynomial regression [35] and Volterra series [36], several guarantees have been established for system matrices spawning from different alphabets and/or different distributions. For instance, in [24] results for Volterra system identification have been derived for signals drawn from \(\{1,0,1\}\) and in [35] signals drawn from \(\mathcal {U}[1,1]\). However, the bipolar case, e.g., \(\{1,1\}\), has not been considered and its treatment within the selfdriven Volterra expansion is still missing. Therefore, in the following, we present a RIP result for the secondorder AGVM, whose technical proof is detailed in Appendix.
Theorem 4
Let \(\{x_i(t)\}_{i=1}^{N}\) for \(t\in [1,2,\ldots ,T]\) be an input sequence of independent random variables drawn from the alphabet \(\{1,1\}\) with equal probability. Assume that the AGVM regression matrix is defined as
where \(L = N+N(N1)/2\), \({\varvec{X}}^l := {\varvec{X}}^{(1)}\) and \({\varvec{X}}^b\) is \({\varvec{X}}^{(2)}\) with the quadratic terms removed, i.e., it only contains bilinear terms \(x_i(t)x_j(t),\,i\ne j\). Then, for any \(\delta _s\in (0,1)\) and for any \(\gamma \in (0,1)\), whenever \(T \ge \frac{4C}{(1\gamma )\delta _s^2}s^2\log N\), the matrix \(\tilde{{\varvec{X}}}^\top\) possesses RIP \(\delta _s\) with probability exceeding \(1  \exp \bigg (\frac{\gamma \delta _s^2}{C}\cdot \frac{T}{s^2}\bigg )\), where \(C = 2\).
Notice that the pure quadratic terms in \({\varvec{X}}^{(2)}\) have been removed. This is due to the fact that for the bipolar signal case, these quadratic terms are constant when the alphabet is \(\{1,1\}\) and equivalent to \({\varvec{X}}^{(1)}\) when the alphabet is \(\{0,1\}\). Hence, in both cases its contribution can be omitted without loss of generality. Furthermore, the data matrices are normalized concerning for the number of available measurements, i.e., T. This is done in order to guarantee that the diagonal entries of the Grammian of \(\tilde{{\varvec{X}}}^\top\) are unity in expectation.
This theorem asserts that \(T \in {{\mathcal {O}}}(s^2\log N)\) observations suffice to recover an ssparse vector with graph Volterra kernels. Since it is considered that the number of unknowns per row of \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) is at most \({{\mathcal {O}}}(N)\) [cf. 12], the bound on the sampling complexity scales as \({{\mathcal {O}}}(N^2\log N)\) which agrees with bounds obtained for linear filtering setups [24, 37]; however, in this paper, the constant C is relatively small.
Given that under the established conditions, the proposed AGVM model is identifiable and is able to leverage sparsity to relieve its sampling complexity, in the following section, we present taskspecific constraints for higherorder link inference and methods for estimating the graph Volterra kernels.
5 Real data applications
With the identifiability guarantees at hand, this section investigates how various learning tasks can benefit from the proposed AGVM. Specifically, two tailored AGVM algorithms for different applications namely topology identification in power systems and link prediction in social networks are presented.
5.1 Topology identification in distribution grids
The vertex set \(\mathcal {V}\) of a graph in a distribution grid comprises the indices of the nodal buses, while the edge set \(\mathcal {E}\) collects all the power distribution lines. The distribution grid is supposed to be a radial structure and thus the vertex set \({\mathcal {V}}:= \{0,\mathcal {N}\}\) has a root (substation) bus indexed by \(n = 0\). Every nonroot bus \(n \in \mathcal {N}= \{ 1, \ldots , N\}\) has a unique parent bus \(\pi _n\). Naturally, the number of nonroot buses is equal to the number of power lines in a radial network, that is, \(\mathcal {N} =\mathcal {E}= N\)^{Footnote 2}.
In order to reveal both the edge connections as well as their higherorder interactions in power grids, we need to analyze the dependency of the signals on buses. The signal \(x_n(t)\) here is the squared voltage magnitude of bus \(n \in {\mathcal {N}}\) at time t. Based on our previous work [38], the voltage relationship among bus n and its children buses is given by
where \(i \in \{k: k \in \mathcal {N}, \pi _k = n\}\) and \(j\in \{k : k \in \mathcal {N}, \pi _k = n, k \ge i\}\); and \(h_n^{(2)}(i,j)\) are the first and secondorder expansion coefficients relating bus n with the sets \(\{i\}\) and \(\{i,j\}\), respectively; \(\epsilon _n\) comprises the modeling error and measurement noise on bus n. Collecting the data into \(\{{\varvec{x}}(t)\}_{t=1}^{T}\), and stacking the first and secondorder coefficients \(h_n^{(1)}(i)\) and \(h_n^{(2)}(i,j)\) into the vectors \({\varvec{h}}_n^{(1)}\) and \({\varvec{h}}_n^{(2)}\) in a lexicographic order. Similar to the derivation from (9) to (12), we can get the voltage on all buses (15) in a compact form as
where the tth columns of \({\varvec{X}}^{(1)}\) and \({\varvec{X}}^{(2)}\) are \({\varvec{x}}(t)\) and \({\varvec{x}}(t) \boxtimes {\varvec{x}}(t)\) respectively; the nth rows of \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) are \(({\varvec{h}}_n^{(1)})^\top\) and \(({\varvec{h}}_n^{(2)})^\top\), respectively.
To estimate the coefficients of model (16), the following assumptions are needed.

A. 3
There is no selfinteraction on a bus; therefore, the matrix \({\varvec{H}}^{(1)}\) is a hollow matrix, i.e., \(h_n^{(1)}(n) = 0, \,\;\forall n \in \mathcal {N}\).

A. 4
There is no secondorder interactions between two buses, that is to say, the coefficients for the secondorder interactions satisfy \(h_n^{(2)}(j,k) = 0\), if \(n = j\), or \(j = k\), or \(n = k\) holds.

A. 5
The secondorder interactions only exist among two buses connected through a central bus. Thus, the graph Volterra coefficients obey \(h_n^{(2)}(j,k) = 0\), if there exists \(h_n^{(1)}(l) = 0,\, \forall \,l\in \{j,k\}\).
Assumptions A. 3 and A. 4 both entail linear constraints, and thus can be easily fit in an optimization problem. To cope with the conditional constraint in A. 5, we call for the auxiliary matrix
where the first column equals \({\varvec{h}}_n^{(1)}\). To guarantee that if \(h_n^{(1)}(i) = 0\), then \(h_n^{(2)}(i,j)= 0\), we enforce row sparsity in n by adding using \(\ell _{2,1}\)regularization on \({\varvec{H}}_n^\top\), \(\forall n \in {\mathcal {N}}\). Based on the sparsity result from Sect. 4, we can estimate the expansion coefficients by the following sparsityaware \(\ell _{2,1}\)regularized leastsquares
where
The set \(\mathcal {X}_{h}\) is convex and signifies the constraints characterized by Assumptions A. 3 and A. 4. The convex optimization problem (18) can be efficiently solved (and distributed) by offtheshelf convex programming toolboxes.
To identify the underlying topology, the Southern California Edison (SCE) 47bus distribution grid [3, 39] using real consumption and solar generation data from the Smart\({^*}\) project [40] was employed. Feeding this data to AC power flow equations [41, 42], we can obtain the voltage squared magnitude measurements \(\{{\varvec{x}}(t)\}_{t=1}^{T}\) across \(T = 240\) time slots. The voltage magnitudes of the substation bus satisfies \(v_0(t) = 1,\forall t\in \{1,\ldots ,T\}\). The radial grid comprises 41 buses where the interactions need to be inferred, after ignoring the root bus and the buses connected to their parent buses with zeroimpedance lines. With \(\{{\varvec{x}}(t)\}_{t=1}^{T}\), the graph Volterra kernels were estimated by solving (18) and constructing \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) as in (16). While removing nonsignificant entries by a pointwise thresholding operation, the grid topology was inferred from the support of \({\varvec{H}}^{(1)}\) and the higherorder interactions were retrieved from \({\varvec{H}}^{(2)}\).
To assess the performance of the proposed AGVM approach (18), we have simulated three edge connectivity recovery methods, namely: i) multikernel based partial correlations (MKPC)scheme [22]; ii) linear PCscheme [43]; and iii) concentration matrixbased scheme [44]. The results were measured by the empirical receiver operating characteristic (ROC) curves and the area under the curve (AUC) values. Figure 1 depicts the ROC curves of all methods, while the AUC for AGVM, MKPC, linear PC, and concentration matrix are 0.9483, 0.9008, 0.8836, and 0.8052, respectively. The results showcase the proposed scheme outperforms all competing alternatives by exploiting the nonlinear interactions.
The second experiment entailed the IEEE 123bus feeder to examine the scalability and performance of the algorithm in topology identification. Voltage squared magnitude measurements \(\{{\varvec{x}}(t)\}_{t=1}^{T}\) across \(T = 400\) time slots were used. The ROC curves of the AGVM, MKPC, linear PC, and concentration matrix method are shown in Fig. 2. The AUC values for the AGVM, MKPC, linear PC, and concentration matrix method are 0.8935, 0.8024, 0.6952, and 0.4963, respectively. Evidently, these results corroborate the effectiveness of our proposed algorithm.
5.2 Link prediction in social networks
Another important domain inspiring higherorder interactions is link prediction in social and other biological networks. The goal of link prediction is to find the most likely subsets of vertices that will interact in the near future based on available observations of the activation of different nodes, e.g., song releases, email exchanges, and paper publications. Specifically, given a set of binary measurements \({\varvec{X}} \in \{0,1\}^{N\times T}\) at time slots \(t = \{1,\ldots ,T\}\), one needs to predict what is the most likely set of nodes to be activated together at any \(t' > T\).
In this subsection, we are considering the problem of predicting the closure of triangles, i.e., triplets of nodes that activate at the same time. Therefore, AGVMs can be restricted to order \(P=2\). Further, using the binary input data assumption, we can regard the interaction between variables as its joint activation and assume the function g in (7) is the product operation. As a result, a direct instantiation of an AGVM produces a model that constructs realvalued signals. Instead of directly modeling \(x_i(t)\), we borrow an idea from binary regression methods and use a latent variable \(z_i(t)\) to model the probability \(P(x_i(t)=1z_i(t))\) as \(P(x_i(t)=1z_i(t)) = \sigma (z_i(t))\), where \(\sigma (\cdot )\) represents the sigmoid function. The latent variable \(z_i(t)\) is then modeled as
Gathering the latent variables for nodes through time slots, the AGVM (12) for the link prediction task becomes
where \([{\varvec{Z}}]_{i,t} = z_i(t)\); \({{\varvec{\Theta }}} = [{\varvec{H}}^{(0)}\,{\varvec{H}}^{(1)}\,{\varvec{H}}^{(2)}]\); and \({\varvec{M}} = [{\varvec{1}}\, ({\varvec{X}}^{(1)})^\top \, ({\varvec{X}}^{(2)})^\top ]^\top\). Different from traditional logistic regression, the binary labels in (20) are the node variables themselves.
The goal of closure prediction is to find the most likely sets of nodes, which form an open structure that will become close. Here, an open structure refers to a set of nodes, \({\mathcal {A}}\), that have interacted with each other, but have not appeared simultaneously on a single simplex set, whose elements are the indexes of the nonzero elements of \({\varvec{x}}(t)\) [45]. Therefore, based on the analysis in our previous work [45], we have the following conclusions. Observing the support of offdiagonal entries of \({\varvec{W}} = {\varvec{X}}^{(1)} ({\varvec{X}}^{(1)})^\top\), we can obtain an initial network connectivity. From this connectivity, and enforcing \(h_i^{(2)}(i,i) = 0, \forall i \in \mathcal {V}\), the set of open triangles \(\mathcal {T}_O\), closed triangles \(\mathcal {T}\), and the candidates for the nonzero graph Volterra coefficients \(\mathcal {S}^{(2)} : =\{\mathcal {S}_{i}^{(2)}\}_{i=1}^{N}\) (cf. (3)) can be obtained. Upon \(\mathcal {S}^{(2)}\), it holds \(\textrm{vec}({{\varvec{\Theta }}}) = {\varvec{B}} {{\varvec{\bar{\theta} }}}\), where \(\bar{{\varvec{\theta }}}\) captures the nonzero kernels, and \({\varvec{B}}\) is an expansive binary matrix that relates the nonzero entries of the graph Volterra kernels in \(\textrm{vec}({{\varvec{\Theta }}})\). These nonzero Volterra kernels are defined by the support obtained from \({\varvec{W}}\). Noticing that we know the nonzero positions of \(\textrm{vec}({{\varvec{\Theta }}})\), and the dimensions of \(\textrm{vec}({{\varvec{\Theta }}})\) and \({\varvec{{\varvec{\theta }}}}\), we can build this \({\varvec{B}}\) matrix based on the initial network connectivity. To estimate the parameter \(\bar{{\varvec{\theta }}}\), we propose a proximal gradient ascent algorithm with sparsity regularization, which is summarized in Alg. 1. Notice that get_mtx_row takes the row of the input which is related to the latent variable \(z_i(t)\); soft_thr(\(\cdot ,\eta \lambda\)) entails soft threshold with respect to \(\eta \lambda\). We assume that open triangles with large coefficients, which means a high level of interaction, are the most likely triangles to become closed. After obtaining \(\bar{{\varvec{\theta }}}\), we sort the entries related to \(\mathcal {T}_O\) by their absolute value, and the top K entries are then the K most likely open triangles to become closed.
Remark 2̇
The most expensive step in Alg. 1 is updating the gradient, which takes an effort of \({{\mathcal {O}}}(NTd)\), where d is the dimension of \({ \bar{{\varvec{\theta }}}}_k\). Thus, the complexity per iteration is \({{\mathcal {O}}}(NTd)\) as the rest of the operations incur at most a \({{\mathcal {O}}}(d)\) complexity. If we consider the worst case, that is, the algorithm runs until it hits its maximum number of iterations, \(k_{\textrm{max}}\), the overall worstcase complexity of the algorithm is then \({{\mathcal {O}}}(NTdk_{\textrm{max}})\).
To examine the effectiveness of Alg. 1, the third experiment entailed the “Enron_email” [46] and “primary_school_contact” [47] datasets as well as several alternatives in [13] in terms of open triangle closure prediction. In the “Enron_email” dataset, the nodes denote the email addresses of different Enron employees, while in the “primary_school_contact” dataset, nodes are proximitybased contacts recorded by wearable sensors in a primary school. The training set contains the first \(10\%\) and \(1\%\) of timestamped events in the “Enron_email” and “primary_school_contact” datasets, respectively, while the testing set includes the remaining data. The proposed algorithm employs \(\lambda =10^{3}\), \(\eta =10^{4}\) and \(k_{\text {max}}=500\) for both experiments. The AUC metric on the first 100 nodes of the datasets for all methods is shown in Fig. 3. The curves showcase the effectiveness of the proposed model along with the logistic regression of Alg. 1 compared with recently proposed methods based on generalizing the link prediction scores for the task of triangle closure prediction.
6 Conclusions
This paper proposes a principled manner to identify and predict the higherorder interactions in networked data. Borrowing ideas from SEMs and Volterra models, a node signal in the network is modeled as a combination of its neighbor signals and a nonlinear combination of the signals in the groups (higherorder links) it belongs to. Some identifiability guarantees of the proposed secondorder AGVM are then provided under three conditions that input/output data exhibit. Our model provides both expressibility for higherorder interactions, as well as interpretability for further understanding of the underlying network dynamics. Moreover, the proposed AGVM is particularized to handle two different applications, which are, topology identification in power systems and link prediction in social networks. The merits of the proposed algorithms relative to existing methods are corroborated through numerical tests using real data. This work also opens up interesting directions for future research, including avoiding the computational burden for largescale networks as well as generalizations of the higherorder model to other challenging applications.
Availability of data and materials
All data generated or analyzed during this study are included in this paper.
Notes
With a slight abuse of notation, the notation after h enclosed by parentheses indicates the node index.
With a slight abuse of notation, we use N here to denote the number of nonroot buses and hence the number of nodes in the network is \(N+1\).
Abbreviations
 AGVM:

Autoregressive graph Volterra model
 SEM:

Structural equation model
 RIP:

Restricted isometry property
 PGA:

Proximal gradient ascend
 MKPC:

Multikernelbased partial correlations
 ROC:

Receiver operating characteristic
 AUC:

Area under the curve
 PPR:

Personalized PageRank similarity
References
D. LibenNowell, J. Kleinberg, The linkprediction problem for social networks. J. Am. Soc. Inf. Sci. Tec. 58(7), 1019–1031 (2007)
S. Golshannavaz, S. Afsharnia, F. Aminifar, Smart distribution grid: optimal dayahead scheduling with reconfigurable topology. IEEE Trans. Smart Grid 5(5), 2402–2411 (2014)
Q. Yang, G. Wang, A. Sadeghi, G.B. Giannakis, J. Sun, Twotimescale voltage control in distribution grids using deep reinforcement learning. IEEE Trans. Smart Grid 11(3), 2313–23 (2020)
S. Sulaimany, M. Khansari, A. MasoudiNejad, Link prediction potentials for biological networks. Int. J. Data Min. Bioinf. 20(2), 161–184 (2018)
R. Milo, S. ShenOrr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon, Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)
S. Barbarossa, S. Sardellitti, Topological signal processing over simplicial complexes. IEEE Trans. Signal Process. 68, 2992–3007 (2020)
P. Frankl, V. Rödl, Extremal problems on set systems. Random Struct. Algorithm 20(2), 131–164 (2002)
S. Sardellitti, S. Barbarossa, L. Testa, Topological signal processing over cell complexes, in IEEE Asilomar Conf. Signals, Systems, Computers, pp. 1558–1562 (2021)
M.T. Schaub, A.R. Benson, P. Horn, G. Lippner, A. Jadbabaie, Random walks on simplicial complexes and the normalized hodge laplacian. arXiv:1807.05044 (2018)
S. Zhang, Z. Ding, S. Cui, Introducing hypergraph signal processing: theoretical foundation and practical applications. IEEE Internet Things J. 7(1), 639–660 (2020)
G.B. Giannakis, Y. Shen, G.V. Karanikolas, Topology identification and learning over graphs: accounting for nonlinearities and dynamics. Proc. IEEE 106(5), 787–807 (2018)
M. Coutino, E. Isufi, T. Maehara, G. Leus, Statespace network topology identification from partial observations. arXiv:1906.10471 (2019)
A.R. Benson, R. Abebe, M.T. Schaub, A. Jadbabaie, J. Kleinberg, Simplicial closure and higherorder link prediction. Proc. Natl. Acad. Sci. 115(48), 11221–11230 (2018)
L. Lim, Hodge laplacians on graphs. arXiv:1507.05379 (2015)
S. Ebli, M. Defferrard, G. Spreemann, Simplicial neural networks. arXiv:2010.03633 (2020)
L. Giusti, C. Battiloro, P. Di Lorenzo, S. Sardellitti, S. Barbarossa, Simplicial attention networks. Preprint arXiv:2203.07485 (2022)
M. Yang, E. Isufi, G. Leus, Simplicial convolutional neural networks, in ICASSP 20222022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2022), pp. 8847–8851
J.J. Hox, T.M. Bechger, An introduction to structural equation modeling. Family Sci. Rev. 11, 354–373 (1998)
H. Lütkepohl, Vector Autoregressive Models (Springer, Berlin, 2011)
G.V. Karanikolas, O. Sporns, G.B. Giannakis, Multikernel change detection for dynamic functional connectivity graphs, in Asilomar Conf. on Signals, Syst., and Comput., Pacific Grove, CA, USA pp. 1555–1559 (2017)
E. Isufi, A. Loukas, N. Perraudin, G. Leus, Forecasting time series with VARMA recursions on graphs. IEEE Trans. Signal Process. 67(18), 4870–4885 (2019)
L. Zhang, G. Wang, G.B. Giannakis, Going beyond linear dependencies to unveil connectivity of meshed grids, in Proc. of CAMSAP, Curacao, AN, pp. 1–5 (2017)
D. Song, R.H. Chan, V.Z. Marmarelis, R.E. Hampson, S.A. Deadwyler, T.W. Berger, Nonlinear dynamic modeling of spike train transformations for hippocampalcortical prostheses. IEEE Trans. Biomed. Eng. 54(6), 1053–1066 (2007)
V. Kekatos, G.B. Giannakis, Sparse volterra and polynomial regression models: recoverability and estimation. IEEE Trans. Signal Process. 59(12), 5907–5920 (2011)
C. Krall, K. Witrisal, G. Leus, H. Koeppl, Minimum meansquare error equalization for secondorder Volterra systems. IEEE Trans. Signal Process. 56(10), 4729–4737 (2008)
V.Z. Marmarelis, Identification of nonlinear biological systems using Laguerre expansions of kernels. Ann. Biomed. Eng. 21(6), 573–589 (1993)
H. Huang, J. Tang, L. Liu, J. Luo, X. Fu, Triadic closure pattern analysis and prediction in social networks. IEEE Trans. Knowl. Data Eng. 27(12), 3374–3389 (2015)
M. Kivelä, A. Arenas, M. Barthelemy, J.P. Gleeson, Y. Moreno, M.A. Porter, Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)
P. Prenter, A Weierstrass theorem for real, separable Hilbert spaces. J. Approxim. Theory 3(4), 341–351 (1970)
S. Boyd, L. Chua, Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits syst. 32(11), 1150–1161 (1985)
M. Schetzen, The volterra and wiener theories of nonlinear systems (1980)
J.A. Bazerque, B. Baingana, G.B. Giannakis, Identifiability of sparse structural equation models for directed and cyclic networks, in IEEE Glob. Conf. Signal Inf. Process, pp. 839–842 (2013)
J.B. Kruskal, Rank, decomposition, and uniqueness for 3way and Nway arrays. Multiway Data Analysis, 7–18 (1989)
E.J. Candes, The restricted isometry property and its implications for compressed sensing. C. R. Acad. Sci. Paris, Ser. I 346(9–10), 589–592 (2008)
B. Nazer, R.D. Nowak, Sparse interactions: Identifying highdimensional multilinear systems via compressed sensing, in Annu. Allert. Conf. Commun. Control Comput. Allert, pp. 1589–1596 (2010)
V. Kekatos, D. Angelosante, G.B. Giannakis, Sparsityaware estimation of nonlinear volterra kernels, in IEEE CAMSAP, pp. 129–132 (2009)
J. Haupt, W.U. Bajwa, G. Raz, R. Nowak, Toeplitz compressed sensing matrices with applications to sparse channel estimation. IEEE Trans. Inf. Theory 56(11), 5862–5875 (2010)
Q. Yang, M. Coutino, G. Wang, G.B. Giannakis, G. Leus, Learning connectivity and higherorder interactions in radial distribution grids, in IEEE Int. Conf. Acoust. Speech Signal Process, pp. 5555–5559 (2020)
M. Farivar, C.R. Clarke, S.H. Low, K.M. Chandy, Inverter VAR control for distribution systems with renewables, in Proc. of IEEE SmartGridComm., Brussels, Belgium, pp. 457–462 (2011)
S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, J. Albrecht, Smart*: an open data set and tools for enabling research in sustainable homes. SustKDD 111(112), 108 (2012)
M. Baran, F.F. Wu, Optimal sizing of capacitors placed on a radial distribution system. IEEE Trans. Power Del. 4(1), 735–743 (1989)
S.H. Low, Convex relaxation of optimal power flow–Part II: exactness. IEEE Trans. Control Netw. Syst. 1(2), 177–189 (2014)
S. Bolognani, N. Bof, D. Michelotti, R. Muraro, L. Schenato, Identification of power distribution network topology via voltage correlation analysis, in Proc. of CDC, Florence, ITL, pp. 1659–1664 (2013)
D. Deka, S. Talukdar, M. Chertkov, M. Salapaka, Topology estimation in bulk power grids: Guarantees on exact recovery. arXiv:1707.01596 (2017)
M. Coutino, G.V. Karanikolas, G. Leus, G.B. Giannakis, Selfdriven graph volterra models for higherorder link prediction, in IEEE Int. Conf. Acoust. Speech Signal Process, pp. 3887–3891 (2020)
B. Klimt, Y. Yang, The enron corpus: A new dataset for email classification research, in ECOML (Spinger, 2004), pp. 217–226
J. Stehlé, N. Voirin, A. Barrat, C. Cattuto, L. Isella, J.F. Pinton, M. Quaggiotto, W. Van den Broeck, C. Régis, B. Lina et al., Highresolution measurements of facetoface contact patterns in a primary school. PloS one 6(8), 23176 (2011)
R. Marsli, F.J. Hall, Geometric multiplicities and geršgorin discs. Am. Math. Mon. 120(5), 452–455 (2013)
Acknowledgements
Not applicable.
Funding
This work was supported in part by ASPIRE project 14926 (within the STW OTP program) financed by the Netherlands Organization for Scientific Research (NWO), NSF Grants 1509040, 1711471, and 1901134.
Author information
Authors and Affiliations
Contributions
All authors contributed to this research, including the design of the simulations and analyses of the results. GL proposed the research direction. QLY and MC conceived the system model. MC completed the identifiability analysis of the paper. QLY analyzed the data and wrote this manuscript. GBG proofread the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
All procedures performed in this paper were in accordance with the ethical standards of research community.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Theorem 1
Let us consider the expansion
Rewriting the linear terms, i.e., terms related with \({\varvec{X}}^{(1)}\), we obtain the system
where \({\mathcal {X}}:= [({\varvec{X}}^{(1)})^\top \, ({\varvec{X}}^{(2)})^\top ]^\top\) and \(\mathcal {H} := [({\varvec{I}} {\varvec{H}}^{(1)})\; \; {\varvec{H}}^{(2)}]\).
Due to hypothesis that \({\mathcal {X}}\) is full row rank, the unique leastsquares solution for the kernel matrices is obtained by \(\mathcal {H}_{\textrm{LS}} := {\varvec{\Gamma }}{\varvec{Y}}{\mathcal {X}}^{\dagger }\). Defining \({\varvec{Q}} := {\varvec{Y}}{\mathcal {X}}^{\dagger }\) and partitioning this matrix appropriately, i.e., \({\varvec{Q}} = [{\varvec{Q}}_1 \in \mathbb {R}^{N\times N}\; \; {\varvec{Q}}_2 \in \mathbb {R}^{N \times M}]\),
we obtain the following relations
Now, let us recall that \({\varvec{\Gamma }}\) is a diagonal matrix and that \({\varvec{H}}^{(1)}\) is a hollow matrix, i.e., its diagonal is filled with zeros. Thus, it holds \(\textrm{diag}({\varvec{\Gamma }}{\varvec{Q}}_1) = {\varvec{1}}\),
which implies \(\textrm{diag} ({\varvec{\Gamma }}_{\textrm{LS}}) := \textrm{diag}({\varvec{Q}}_1)^{1}\).
Finally, the estimates for the kernel matrices are given as
The proof is completed. \(\square\)
Proof of Theorem 2
First, let us rewrite the expansion as
Now, consider the ith column of both sides of the expression above, i.e., \([({\varvec{X}}^{(1)})^\top ]_i = \mathcal {X}^\top [\mathcal {H}^\top ]_i\).
Suppose there exists a vector \({\varvec{h}}_i\), \({\varvec{h}}_i \ne [\mathcal {H}^\top ]_i\), satisfying the same relation and with \(K = K_1 + K_2\) nonzero entries. This implies that \({\varvec{0}} = \mathcal {X}^\top ([\mathcal {H}^\top ]_i  {\varvec{h}}_i)\)
must hold. As \([\mathcal {H}]^\top _i\) and \({\varvec{h}}_i\) have at most K entries, their difference has at most 2K nonzero entries. Hence, if \(\textrm{kr}(\mathcal {X}^\top ) \ge 2K\), any possible subset of columns of \(\mathcal {X}^\top\) are linearly independent. Thus, \(([\mathcal {H}^\top ]_i  {\varvec{h}}_i) = {\varvec{0}}\) holds. This contradicts the assumption, hence the result of the theorem holds. \(\square\)
Proof of Theorem 3
Let us consider the expansion
Notice that the jth column of \({\varvec{X}}^{(2)}\) is given by \({{\varvec{C}}}\big ([{\varvec{X}}^{(1)}]_j\otimes [{\varvec{X}}^{(1)}]_j\big )\) where \([{\varvec{X}}^{(1)}]_j\) is the jth column of \({\varvec{X}}^{(1)}\), \(\otimes\) is the Kronecker product. and \({\varvec{C}}\) is a binary selection matrix picking the appropriate rows of the Kronecker product. Hence, we can express \({\varvec{X}}^{(2)}\) using the KhatriRao product \(*\), and \({\varvec{X}}^{(1)}\) as
By hypothesis, we have that \({\varvec{X}}^{(1)}{\varvec{\Pi }}_1 = {\varvec{0}}\). Thus, by right multiplying the expansion by \({\varvec{\Pi }}_1\) and reorganizing terms, we obtain
where \(\tilde{{\varvec{Y}}}:={\varvec{Y}}{\varvec{\Pi }}_1 \ne {\varvec{0}}\) (by assumption) and the identity \({\varvec{X}}^{(2)}= {{\varvec{C}}}({\varvec{X}}^{(1)}*{\varvec{X}}^{(1)})\) has been used. Now, let us consider the ith equation of the above relation, i.e.,
where \({\tilde{{\varvec{X}}}}^{(2)} := {\varvec{X}}^{(2)} {\varvec{\Pi }}_1\); \({\varvec{h}}_i^{(2)}\) and \(\tilde{{\varvec{y}}}_i\) are the ith rows of \({\varvec{H}}^{(2)}\) and \(\tilde{{\varvec{Y}}}\) in columnvector form, respectively. As \(\textrm{kr}({{\tilde{{\varvec{X}}}}^{(2)}}) \ge 2K_2 + 1\) and the number of nonzero elements per row in \({\varvec{H}}^{(2)}\) is bounded above by \(K_2\) by hypothesis, the above system has a unique solution (up to an scalar ambiguity that does not affect the support) with such a sparsity level, i.e., sparsest solution, \(\forall \; i\). \(\square\)
Proof of Theorem 4
Let us consider a particular realization of \(\tilde{{\varvec{X}}}^\top := [\frac{1}{\sqrt{T}}{\varvec{X}}^{l}\,\frac{1}{\sqrt{T}}{\varvec{X}}^b]^\top = [\tilde{{\varvec{X}}}^l\, \tilde{{\varvec{X}}}^b]\in \{1,1\}^{T\times L}\) and its Grammian \(\tilde{{\varvec{G}}}:= \tilde{{\varvec{X}}}\tilde{{\varvec{X}}}^\top\). By the Gersgorin disc theorem [48], if \([\tilde{{\varvec{G}}}]_{ii}  1 < \delta _d\) and \([\tilde{{\varvec{G}}}]_{ij} \le \frac{\delta _o}{s}\) for every i, j with \(j\ne i\) and \(\delta _d + \delta _o = \delta\) for some \(\delta \in (0,1)\), then \(\tilde{{\varvec{X}}}\) possesses RIP \(\delta _s \le \delta\). Therefore, we can upper bound the probability of \(\tilde{{\varvec{X}}}\) not satisfying RIP of value \(\delta\), \(\textrm{Pr}(\delta _s > \delta )\), as
As \({{\varvec{\tilde{G}}}}\) is symmetric, we can use the union bound only for its unique entries to upper bound \(\textrm{Pr}(\delta _s > \delta )\) as
To show the result of the theorem, we proceed next to bound the probabilities above. The analysis of these probabilities is similar to the one in [24]. However, here we obtain results for a different distribution and for linear and bilinear components. To simplify the notation, we introduce the following partition for the Grammian matrix, i.e.,
where \({{\varvec{\tilde{G}}}}^{ll}:= \tilde{{\varvec{X}}}^{l}(\tilde{{\varvec{X}}}^l)^\top\), \(\tilde{{\varvec{G}}}^{lb}:= \tilde{{\varvec{X}}}^l(\tilde{{\varvec{X}}}^b)^\top\) and \(\tilde{{\varvec{G}}}^{bb}:= \tilde{{\varvec{X}}}^b(\tilde{{\varvec{X}}}^b)^\top\).
Recalling that the raw moments for the inputs are given by
we obtain the following relations
and \(\mathbb {E}\{[\tilde{{\varvec{G}}}^{lb}]_{ij}\} = 0\; \forall \; i,j.\) By a quick inspection, we notice that the terms of the first part of \(\textrm{Pr}(\delta _s > \delta )\) are identically zero, hence \(\delta _d = 0\).
To bound the required probabilities, we make use of the following Hoeffding’s inequality.
Lemma 1
(Hoeffding’s Inequality) Given \(t > 0\) and independent random variables \(\{x_i\}_{i=1}^N\) bounded as \(a_i \le x_i \le b_i\) almost surely, the sum \(s_N := \sum \nolimits _{i=1}^{N} x_i\) satisfies
Let us consider the offdiagonal elements of \({{\varvec{\tilde{G}}}}^{ll}\). The related probability can be bounded as
as we consider that these entries are the result of a sum of T independent variables contained in \(\{1,1\}\). Similar bounds can be found for the other entries of \({{\varvec{\tilde{G}}}}\), i.e.,
Recollecting the probabilities for all the entries, we obtain
for \(N > 2\). Considering \(\delta = \delta _o\) (as \(\delta _d = 0\)) and setting \(C=2\), for a \(\gamma \in (0,1)\) and \(T \ge \frac{4C}{(1\gamma )\delta ^2}s^2\log N,\) we can simplify the above bound as
\(\square\)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yang, Q., Coutino, M., Leus, G. et al. Autoregressive graph Volterra models and applications. EURASIP J. Adv. Signal Process. 2023, 4 (2023). https://doi.org/10.1186/s13634022009606
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634022009606
Keywords
 Higherorder interactions
 Volterra series
 Graph inference
 Link prediction