Skip to main content

Autoregressive graph Volterra models and applications


Graph-based learning and estimation are fundamental problems in various applications involving power, social, and brain networks, to name a few. While learning pair-wise interactions in network data is a well-studied problem, discovering higher-order interactions among subsets of nodes is still not yet fully explored. To this end, encompassing and leveraging (non)linear structural equation models as well as vector autoregressions, this paper proposes autoregressive graph Volterra models (AGVMs) that can capture not only the connectivity between nodes but also higher-order interactions presented in the networked data. The proposed overarching model inherits the identifiability and expressibility of the Volterra series. Furthermore, two tailored algorithms based on the proposed AGVM are put forth for topology identification and link prediction in distribution grids and social networks, respectively. Real-data experiments on different real-world collaboration networks highlight the impact of higher-order interactions in our approach, yielding discernible differences relative to existing methods.

1 Introduction

Full awareness of networks and networked interactions is required for understanding the behavior of complex systems. These systems are typically modeled as graphs in many applications such as financial markets, social networks, power systems, and transportation systems [1,2,3], to name a few. Graph structure identification (prediction) is to identify (predict) if an edge exists (will exist) between a pair of nodes of a graph, given a set of network observations in the form of node attributes at different time instances. Applications include studying the growth of social networks [1] and their dynamics in social sciences, predicting what are the most likely links between users in recommender systems, unveiling pair-wise interactions between elements of different ecological niches or predicting interactions that were not studied due to time or cost restrictions in biology [4].

Albeit pair-wise interactions have the ability to capture the dynamics of the underlying graph, a lot of the interplay among networked data occurs beyond just two nodes [5]. For instance, human interaction over social media takes place with a team rather than two individuals. Furthermore, molecules tend to show more interactions among different groups. Moreover, in smart grids, the dependency among power variables such as voltage and current occurs in a region instead of single pairs. Early approaches to capture higher-order relations in the underlying network have mainly relied on set systems, hypergraphs, and simplicial complexes [6,7,8,9,10], to name a few. Furthermore, to address the existence of nonlinear connectivity, topology identification approaches leveraging partial correlations as well as kernels have recently been developed [11, 12]. A link prediction approach for evaluating higher-order data models of complex systems is proposed in [13]. While offering mathematical frameworks to study higher-order relations, these works either directly leverage the network structure, e.g., connections in a hypergraph, or make use of concepts inherited from physical phenomena, e.g., the analysis based on simplicial complexes, using cohomology [14], which might not exist for all kinds of datasets.

Aiming to modeling dynamical processes over a network, different attempts have been made. Specially, data-driven neural network solutions have attracted growing attention recently to learn the nonlinear connectivity [15,16,17]. Furthermore, rooted in structural equation models [18] (SEMs), and in particular combinatorial vector autoregressive models [19], several efforts leveraging either kernels or partial correlations [11, 12] have been devoted to capturing the dynamics; see [20,21,22], and references therein. Despite that these approaches manage to capture complex dynamics existing in networked data, they lack interpretability beyond pair-wise interactions. Another issue of the mentioned models is their poor scalability; in other words, the complexity grows exponentially along with the model order.

Volterra series and kernels have emerged as promising tools for data analysis in different applications, e.g., brain networks [23], gene data [24] and communications [25]. Leveraging the sparsity of the Volterra kernels as well as a parsimonious model description, the computational complexity can be reduced [24], especially when using an appropriate basis expansion model of the kernels. Moreover, one can retrieve the original Volterra kernels from the considered basis expansion without losing model expressibility [26]. Although the Volterra series is powerful in modeling nonlinear-temporal interactions, using it to capture the higher-order relations in the networked data is not well-explored. Furthermore, in Volterra series models, the autoregressive property is not fully considered as in SEM when interpreting the dynamics in networked data.

Building upon SEMs and Volterra series models, this work advocates an autoregressive graph Volterra model (AGVM) to capture higher-order interactions present in networked data. Different identifiability conditions for the proposed model are derived including the identifiability of the network connection based on exogenous data, sparsity (relieving sampling complexity) and the restricted isometry property in the bipolar case. The proposed model uses graph Volterra kernels to identify interactions between nodes or groups of nodes, providing a principled way to tackle higher-order interactions in networks. Furthermore, to estimate the graph Volterra kernels, two tailored AGVM algorithms for topology identification in power systems and link prediction in social networks are introduced. The proposed approaches differ from existing higher-order interaction methods [13, 27], which solely focus on extending metrics commonly used in informal scoring for classical link prediction and identification.

Paper outline. The rest of the paper is organized as follows: Sect. 2 briefly reviews the mathematical model that is used throughout the paper. Section 3 introduces the higher-order interactions model based on Volterra series. Section 4 provides identifiability guarantees for the proposed model. Section 5 presents two tailored algorithms for the application of topology identification in power systems as well as link prediction in social networks. Concluding remarks are drawn in Sect. 6.

Notation. Lower- (upper-) case boldface letters denote column vectors (matrices), and normal letters represent scalars. Calligraphic letters are reserved for sets with the exception of matrices \(\mathcal {X}\) and \(\mathcal {H}\). Symbol \(\top\) stands for transposition. The (ij)th entry of matrix \({\varvec{X}}\) is denoted as \(x_{i,j}\) or \([{\varvec{X}}]_{i,j}\). The definition of the operator \(\boxtimes\) is the reduced Kronecker product on two equal-size vectors, that is, \({\varvec{x}}\boxtimes {\varvec{y}}:= [x_1y_1\; x_1y_2\;\cdots \;x_i y_j\;\cdots \; x_{N-1}y_N\;x_N y_N]^\top\), where \(i \le j\). The operation \(*\) denotes the Khatri-Rao product.

2 Preliminaries

Consider a graph \(\mathcal {G}= (\mathcal {V},\mathcal {E})\), where \(\mathcal {V}\) is the vertex (node) set with cardinality \(|\mathcal {V}| = N\), and \(\mathcal {E}\) is the edge set with cardinality \(|\mathcal {E}| = E\), respectively. A time-series of graph signals \(\{{\varvec{x}}(t)\in \mathbb {R}^N\}_{t=1}^{T}\) is collected at the nodes \(\mathcal {V}\). In addition, external (exogenous) observables \(\{{\varvec{\zeta }}(t)\in \mathbb {R}^{N}\}_{t=1}^{T}\) are available such as features of the nodes, inputs from different networks, and network-level snapshots or layers [28].

The classical structural equation model (SEM) [18] considering the signal \({\varvec{x}}(t)\) over the graph and the exogenous variables \({\varvec{\zeta }}(t)\) can be described as follows

$$\begin{aligned} {\varvec{x}}(t) = {\varvec{A}}{\varvec{x}}(t) + {\varvec{\Gamma }}{\varvec{\zeta }}(t) \in \mathbb {R}^{N} \end{aligned}$$

where \({\varvec{\Gamma }}\in \mathbb {R}^{N\times N}\) is a diagonal matrix representing the mapping of the exogenous input on the node variables \({\varvec{x}}(t)\), and \({\varvec{A}}\in \mathbb {R}^{N\times N}\) represents the inter-relations among those variables. Let \(x_i(t)\) and \(\zeta _i(t)\) denote the ith entry of \({\varvec{x}}(t)\) and \({\varvec{\zeta }}(t)\), respectively. The signal at the ith node \(x_i(t)\) can then be obtained by a weighted combination of the signal of all other nodes and the corresponding exogenous variables as

$$\begin{aligned} x_i(t) = \sum _{j\in {\mathcal {V}}} a_{i,j} x_j(t) + \sum _{j\in {\mathcal {V}}}\gamma _{i,j}\zeta _j(t) \end{aligned}$$

where \(a_{i,j}\) and \(\gamma _{i,j}\) are the (ij)th entry of \({\varvec{A}}\) and \({\varvec{\Gamma }}\), respectively.

The SEM in (2) is able to express the relation between different node variables through the nonzero entries \(a_{i,j}\) of \({\varvec{A}}\), where \({\varvec{A}}\) is a hollow matrix and shares the support with the adjacency matrix of the graph. However, it only accounts for the first-order dependencies (i.e., pair-wise relations) in a linear fashion. Several efforts have focused on expanding the expressive power of linear SEMs via nonlinear kernels of nodal variables; see, e.g., [11] and references therein. Although meaningful in many applications, they neglect the so-called higher-order interactions that are present in networked data through higher-order graph structures [5], such as subgraphs and k-cliques, which are subsets of vertices of an undirected graph where every two distinct vertices in the subset are adjacent.

In the following section, we introduce a Volterra model to capture such higher-order interactions and their descriptors.

3 Higher-order interactions in graphs

Before modeling the higher-order interactions in graphs, let us give a description of the ith node signal, \(x_i(t)\), in terms of a set of subsets of nodes, \(\mathcal {S}_i\). To model first-order interactions, these subsets of nodes are simply single nodes and the set \(\mathcal {S}_i\) is nothing more than the set of neighbors of the ith node, i.e., the nodes j for which \(a_{i,j}\) is nonzero in a SEM. Modeling higher-order interactions though requires the subsets to consist of more than one node and hence \(\mathcal {S}_i\) will yield a set of subsets of nodes.

Mathematically, the set \(\mathcal {S}_i^{(P)}\) which contains the subsets for defining interactions up to order P is defined as follows

$$\begin{aligned} \mathcal {S}_i^{(P)} := \bigcup _{p=1}^{P}\mathcal {S}_{i}^{(*,p)},\;\text {with}\;\mathcal {S}_{i}^{(*,p)} := \bigcup _{l=1}^{L_p}\mathcal {S}_i^{(l,p)} \end{aligned}$$

where p denotes the order of the set, \(L_p\) the number of subsets of order p that exist, and \(\mathcal {S}_i^{(l,p)}\subset \mathcal {V}\) the lth set of p nodes related to the ith node in the graph \(\mathcal {G}\). For simplicity, the exogenous variable has been omitted. With (3), we can put forth the following signal model

$$\begin{aligned} x_i(t) = f({\varvec{x}}(t),\mathcal {S}_i^{(P)})+\epsilon _i(t)\;\forall \; i\in \{1,\ldots ,N\} \end{aligned}$$

where f maps the signals \({\varvec{x}}(t)\) from \(\mathcal {S}_i^{(P)}\) to \(x_i(t)\). For example, considering \(P=1\) and \(\mathcal {S}_{i}^{(l, p)}\) only containing the lth neighbor of the ith node, i.e., \(\mathcal {S}_i^{(1)} = \cup _{l=1}^{L_1}\mathcal {S}_{i}^{(l,1)}\), and assuming a linear map for f, we retrieve the SEM without considering exogenous variables [cf. (2)].

The subsets in \(\mathcal {S}_i^{(P)}\) capture the gregarious behavior of the nodes. For example, the subsets \(\{\mathcal {S}_{i}^{(l, 2)}\}_{l=1}^{L_2}\) can be viewed as the pairs of nodes that form a triad with the ith node. Similarly, the subsets \(\{\mathcal {S}_{i}^{(l, p)}\}_{l=1}^{L_p}\) can be defined as the nodes that complete a \((p+1)\)-clique when the ith node is added. In fact, this subset assignation can be done for any other graph motif [c.f [5]] that seems adequate for the data under analysis. We can now approximate the nonlinear relationship in (4) by a Volterra expansion as follows

$$\begin{aligned} x_i(t) = h_i^{(0)} + \sum \limits _{p=1}^{P}H^{(p)}_i[{\varvec{x}}(t)] + \epsilon _i(t) \end{aligned}$$

where \(h_i^{(0)}\) is a constant term, and \(H^{(p)}_i[{\varvec{x}}(t)]\) denotes the pth order Volterra module given by

$$\begin{aligned} H_{i}^{(p)}[{\varvec{x}}(t)]:=\sum _{l=1}^{L_{p}} h_{i}^{(l,p)} g(\{x^{(q)}(t): q \in \mathcal {S}_{i}^{(l,p)}\}) \end{aligned}$$

with \(h_{i}^{(l,p)}\) the lth expansion coefficient of order p for the ith variable, g a permutation-invariant nonlinear function describing the type of interaction among the variables, and q means the nodes that complete a \((q+1)\)-clique when the ith node is added. As the set \(\mathcal {S}_{i}^{(P)}\) is generally unknown, meaning the interactions at all orders are unknown, the module (6) can be equivalently rewritten using the set of all index combinations of size p, that is

$$\begin{aligned} H_{i}^{(p)}[\varvec{x}(t)]=\sum _{k_{1}=1}^{N} \cdots \sum _{k_{p}=k_{p-1}}^{N} h_{i}^{(p)}(k_{1}, \ldots , k_{p}) g(\{x^{(k_{q})}(t)\}_{q=1}^{p}) \end{aligned}$$

where the Volterra kernel \(h_i^{(p)}(k_1,\ldots ,k_p)\) denotes a \((p+1)\)-clique for the index combination \(\left\{ k_{1}, \ldots , k_{p}\right\}\). The nonzero coefficients of (7) can be uniquely mapped to the coefficients of (6). A fundamental result of Volterra expansion is that any continuous nonlinear system can be uniformly approximated (i.e., in the \(L^\infty\)-norm) to arbitrary accuracy by a Volterra series operator of sufficient but finite order if the input signals form a compact subset of the input function space [29, 30].

Notice that in the case of the absence of exogenous variables, the Volterra expansion (5) for the ith signal can be directly related to the SEM expression (2) by considering \(h_i^{(0)}\), \(h_i^{(1)}(j) = a_{i,j}\)Footnote 1 and \(h^{(p)}_i(k_1,\ldots ,k_p) = 0, \;\forall \; p > 1\). Thus, a SEM can be seen as a special case of the Volterra expansion where the Volterra kernels are constrained and the inputs are assumed to be the signals on the graph.

Now, we are ready to postulate our AGVM that considers higher-order interactions as follows

$$\begin{aligned} x_i(t) = h_i^{(0)} + \sum _{p=1}^P \sum _{k_{1}=1}^{N} \cdots \sum _{k_{p}=k_{p-1}}^{N} h_{i}^{(p)}(k_{1}, \ldots , k_{p}) g(\{x^{(k_{q})}(t)\}_{q=1}^{p})+ \epsilon _i(t). \end{aligned}$$

The proposed expansion (8) captures both the autoregressive nature of SEMs and the identifiability and expressibility of Volterra series models. This aspect distinguishes the model from existing nonlinear extensions of SEMs that only consider nonlinear functions of pair-wise interactions. Therefore, higher-order structures in the graph are not seen as fundamental atoms to establish the behavior of the node signal. On the other hand, the expansion (8) allows identifying the existence of higher-order interactions such as triads or p-cliques by observing its nonzero coefficients.

Remark 1

For simplicity, the present work only focuses on interactions up to the second order and uses a product for g. A generalization to higher-order interactions and other permutation-invariant functions is straightforward. For other nonlinear functions, a more careful analysis should be carried out.

By stacking the signal on node i over time steps \(t=(1, \ldots , T)\) in \({\varvec{x}}_i\), i.e., \({\varvec{x}}_i := [x_i(1),x_i(2),\ldots ,x_i(T)]^\top\) (similarly stacking the modeling errors through time in \({\varvec{\epsilon }}_i\)), stacking the signals on all nodes at time step t in \({\varvec{x}}(t)\), i.e., \({\varvec{x}}(t):=~ [x_1(t),x_2(t),\ldots ,x_N(t)]^\top \in \mathbb {R}^{N}\), and restricting ourselves to a second-order model and a product for g, we can rewrite (8) in a matrix-vector form as

$$\begin{aligned} {{\varvec{x}}_i^\top = h_i^{(0)}{\varvec{1}}^\top + ({\varvec{h}}_i^{(1)})^\top {\varvec{X}}^{(1)}+ ({\varvec{h}}_i^{(2)})^\top {\varvec{X}}^{(2)}+ {\varvec{\epsilon }}_i^\top \in \mathbb {R}^{T}.} \end{aligned}$$

Here, we have made the following definitions: \({\varvec{X}}^{(1)}:= [{\varvec{x}}(1),\ldots , {\varvec{x}}(T)]\in \mathbb {R}^{N\times T}\), \({\varvec{X}}^{(2)}:=[{\varvec{x}}(1)\boxtimes {\varvec{x}}(1),\ldots ,{\varvec{x}}(T)\boxtimes {\varvec{x}}(T)] \in {{\textbf{R}}}^{M \times T}\), \([{\varvec{h}}_i^{(1)}]_j = h_i^{(1)}(j)\), and

$$\begin{aligned} {\varvec{h}}_i^{(2)} = \textrm{vech}\bigg (\begin{bmatrix} h_i^{(2)}(1,1) &{} h_i^{(2)}(1,2) &{} \cdots &{} h_i^{(2)}(1,N) \\ h_i^{(2)}(2,1) &{} h_i^{(2)}(2,2) &{} \cdots &{} h_i^{(2)}(2,N) \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ h_i^{(2)}(N,1) &{} h_i^{(2)}(N,2) &{}\cdots &{} h_i^{(2)}(N,N) \end{bmatrix}\bigg ) \in \mathbb {R}^{M} \end{aligned}$$

where \(\textrm{vech}(\cdot )\) is the half-vectorization operation that retrieves the upper-triangular part of its matrix argument, and \(M:=N(N+1)/2\).

The second-order expansion of \({\varvec{x}}_i\) can be further expressed as

$$\begin{aligned} {\varvec{x}}_i^\top&= \begin{bmatrix} h_i^{(0)}, ({\varvec{h}}_i^{(1)})^\top , ({\varvec{h}}_i^{(2)})^\top \end{bmatrix} \begin{bmatrix} {\varvec{1}}^\top \\ {\varvec{X}}^{(1)}\\ {\varvec{X}}^{(2)}\end{bmatrix} + {\varvec{\epsilon }}_i^\top \nonumber \\&= {\varvec{\theta }}_i^\top {\varvec{M}} + {\varvec{\epsilon }}_i^\top \end{aligned}$$

where the unknown parameter \({\varvec{\theta }}_i\in \mathbb {R}^{1+N+M}\) contains the equivalent graph Volterra kernels for the ith variable and \({\varvec{M}} \in \mathbb {R}^{(1+N+M)\times T}\) is the (known) system matrix.

For all involved node signals \(i \in [N]\), we finally obtain

$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{\Theta }}^\top {\varvec{M}} + {\varvec{E}}\in \mathbb {R}^{N \times T} \end{aligned}$$

where \({\varvec{\Theta }}\in \mathbb {R}^{(1+N+M) \times N}\) collects all the unknown parameters of the system, and \({\varvec{E}}\) is the corresponding modeling error matrix. For interpretability of (11), one can also rewrite it as

$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(0)}{\varvec{1}}^\top + {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}+ {\varvec{\Gamma }}{\varvec{Y}}+ {\varvec{E}} \end{aligned}$$

where \({\varvec{H}}^{(i)}\) is the ith-order graph Volterra kernel matrix whose entries are in lexicographic order, i.e., as defined through \({\varvec{X}}^{(i)}\). In particular, \({\varvec{H}}^{(0)} = {\varvec{h}}^{(0)}\in \mathbb {R}^{N}\) is a column vector with all constant terms stacked. Here, we present the general form of the AGVM model also accounting for the exogenous variables \({\varvec{Y}}\).

One of the challenges to find the unknown parameters \({\varvec{\Theta }}\) is its large dimensionality. Although symmetrized Volterra kernels can be uniquely identified [31], the order of the number of unknown parameters in (11) is \({{\mathcal {O}}}(N^3)\) which leads to high computational costs and poor sampling efficiency. Fortunately, as it is shown ahead, judicious modeling of the graph Volterra kernels leads to efficient higher-order interaction identification. But before presenting methods for estimating the graph Volterra kernels, we need to provide identifiability guarantees for the proposed model.

4 Identifiability of AGVM

This section focuses on the conditions that the input/output data should exhibit in order to uniquely identify the second-order AGVM model (12). Although asymptotic results have been obtained for sparse regression, i.e., the Lasso estimator, here we are more interested in the finite-sample regime. Therefore, borrowing tools from the compressing sensing literature and linear algebra, we are able to provide recovery guarantees in both deterministic and probabilistic settings.

Without loss of generality, we present our following results considering that the zeroth-order term \({\varvec{H}}^{(0)}\) is projected out and the noise term \({\varvec{E}}\) is not present or has been removed.

Our first result asserts identifiability of the network connections, represented through the graph Volterra kernels, highlighting the role of the exogenous data \({\varvec{Y}}\). Notice that this result is a generalization of the result in [32] obtained for resolving direction ambiguities in structural equation models applied for directed graphs.

Theorem 1

Suppose that data \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) and \({\varvec{Y}}\) abide to the second-order AGVM model

$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}+ {\varvec{\Gamma }}{\varvec{Y}}\end{aligned}$$

for a matrix \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and diagonal matrix \({\varvec{\Gamma }}\) with diagonal entries\([{\varvec{\Gamma }}]_{ii} \ne 0\). If \({\mathcal {X}}:= [({\varvec{X}}^{(1)})^\top \;({\varvec{X}}^{(2)})^\top ]^\top\) is full row rank, then \({\varvec{H}}^{(1)}\), \({\varvec{H}}^{(2)}\) and \({\varvec{\Gamma }}\) are uniquely expressible in terms of \({\mathcal {X}}\) and \({\varvec{Y}}\) as

$$\begin{aligned} \textrm{diag}({\varvec{\Gamma }})&= \textrm{diag}({\varvec{Q}}_1)^{-1} \\ {\varvec{H}}^{(1)}&= {\varvec{I}}- {\varvec{\Gamma }}{\varvec{Q}}_1 \\ {\varvec{H}}^{(2)}&= -{\varvec{\Gamma }}{\varvec{Q}}_2 \end{aligned}$$

where \({\varvec{Y}}{\mathcal {X}}^{\dagger } = [{\varvec{Q}}_1\in \mathbb {R}^{N\times N}\,{\varvec{Q}}_2\in \mathbb {R}^{N \times M}]\).


See Appendix. \(\square\)

This result exhibits the importance of the exogenous data (perturbation), \({\varvec{Y}}\), to uniquely identify the AGVM model. This shows, as in the classical SEM, that given a sufficiently rich perturbation, the directionality, as well as the higher-order interactions (triplets), can be uniquely determined from the measured data.

Although the result of Theorem 1 establishes the identifiability of the AGVM model, it requires a full row rank data matrix \(\mathcal {X}\), which in many cases might not be possible, i.e., the number of samples must be at least \({{\mathcal {O}}}(N^2)\). Thus, in order to improve the sampling complexity for the problem, prior information is required to constrain the model. A natural assumption, arising in many networked-data applications, is the sparse interaction among the nodes. That is, the number of connections (edges) among nodes are much smaller than the size of the network, and therefore, the number of triads in which they participate are restricted. Before proceeding, we need the following sparsity assumptions.

  1. A. 1

    Each row of matrix \({\varvec{H}}^{(1)}\) has at most \(K_1\) nonzero entries, i.e., \(\Vert {\varvec{h}}_{i}^{(1)} \Vert _0 \le K_1\; \forall \; i\).

  2. A. 2

    Each row of matrix \({\varvec{H}}^{(2)}\) has at most \(K_2\) nonzero entries, i.e., \(\Vert {\varvec{h}}_{i}^{(2)} \Vert _0 \le K_2\; \forall \; i\).

Assumptions A.1 and A.2 on the graph Volterra kernels can be seen as restrictions on the number of edges and triangles existing in the graph. By letting \(K_1 \in {{\mathcal {O}}}(1)\) and \(K_2 \in {{\mathcal {O}}}(N)\), these assumptions translate into graph Volterra kernels that represent a sparse graph, i.e., \({{\mathcal {O}}}(N)\) edges and \({{\mathcal {O}}}(N^2)\) triangles. In addition, as shown in the following, the sparsity assumptions make the identification of the system possible when only a reduced number of measurements are available.

Before stating our second result, a definition is required.

Definition 1

The Kruskal rank of a matrix \({\varvec{A}}\in \mathbb {R}^{N\times M}\), denoted \(\textrm{kr}({\varvec{A}})\), is the maximum number k such that any combination of k columns of \({\varvec{A}}\) forms a sub-matrix with full column rank.

Although the Kruskal rank is, in general, more restrictive than the traditional rank and harder to verify, when the entries of \({\varvec{A}}\) are drawn from a continuous distribution, its Kruskal rank equals its rank [33].

To begin with, we consider a model without exogenous inputs. That is, a pure self-driving system.

Theorem 2

Let \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) abide to the second-order AGVM

$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}\end{aligned}$$

for sparse matrices \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and \({\varvec{H}}^{(2)}\) satisfying A.1 and A.2, respectively. If\(\textrm{kr}(\mathcal {X}^\top ) \ge 2(K_1 + K_2)\), where \(\mathcal {X} := [({\varvec{X}}^{(1)})^\top \, ({\varvec{X}}^{(2)})^\top ]^\top\), then \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) can be uniquely identified.


See Appendix. \(\square\)

This result shows that it is possible to uniquely identify both graph Volterra kernel matrices when the Kruskal condition is met even in the case that the model is self-driven, i.e., no exogenous inputs.

In the following, we present a result involving the exogenous inputs.

Theorem 3

Let \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) and \({\varvec{Y}}\) abide to the second-order AGVM

$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}+ {\varvec{\Gamma }}{\varvec{Y}}, \end{aligned}$$

for sparse matrices \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and \({\varvec{H}}^{(2)}\) satisfying A.2; and a diagonal matrix \({\varvec{\Gamma }}\) with diagonal entries \([{\varvec{\Gamma }}]_{ii} \ne 0\). Given a matrix \({\varvec{\Pi }}_1\) such that \({\varvec{X}}^{(1)}{\varvec{\Pi }}_1 = {\varvec{0}}\) and \({\varvec{Y}}{\varvec{\Pi }}_1 \ne {\varvec{0}}\), if \(\textrm{kr}({{\varvec{C}}}[{\varvec{X}}^{(1)}*{\varvec{X}}^{(1)}]{\varvec{\Pi }}_1) \ge 2K_2 + 1\), where \({\varvec{C}}\) is a binary selection matrix picking the appropriate rows of the Kronecker product, then the positions of the nonzero entries of \({\varvec{H}}^{(2)}\) are unique.


See Appendix. \(\square\)

Here, differently from the pure self-driven case, the presence of the exogenous term leads to a different identifiability condition: structural identifiability. That is, given that the condition on the Kruskal rank of the projected data matrix is met, the positions of the nonzero entries of the second-order graph Volterra kernels \({\varvec{H}}^{(2)}\) can be uniquely identified. When the graph Volterra kernels are related to the \((p+1)\)-cliques [cf. (7)] in a network, the above result allows for higher-order link prediction. In addition, the support of \({\varvec{H}}^{(1)}\) can then also be partially estimated from the nonzero entries of \({\varvec{H}}^{(2)}\), as by assumption, in this case, their supports share a relation. More specifically, the existence of a triangle between nodes directly implies edges among the elements in the clique. However, as the nonexistence of a clique, e.g., a triangle, does not rule out the existence of an edge, not all edges, i.e., positions of the nonzeros of \({\varvec{H}}^{(1)}\), can be identified.

To present an identifiability result using sparse recovery, we employ the following definition.

Definition 2

(Restricted Isometry Property (RIP)) Matrix \({\varvec{A}} \in \mathbb {R}^{N\times M}\) possesses the restricted isometry of order s, denoted as \(\delta _s\in (0,1),\) if for all \({\varvec{h}}\in \mathbb {R}^{M}\) with \(\Vert {\varvec{h}} \Vert _0\le s\) [34]

$$\begin{aligned} (1-\delta _s)\Vert {\varvec{h}}\Vert _2^2 \le \Vert {\varvec{A}} {\varvec{h}}\Vert _2^2 \le (1+\delta _s)\Vert {\varvec{h}}\Vert _2^2. \end{aligned}$$

RIP is a fundamental property for providing identifiability conditions of sparse recovery. It has been shown that given \(\delta _{2s} < \sqrt{2}-1\), the constrained version of the Lasso optimization problem

$$\begin{aligned} \begin{array}{ll} \underset{{\varvec{h}} \in \mathbb {R}^{M}}{\min }{\Vert {\varvec{h}}\Vert _1 } \quad \textit{subject to} \quad \Vert {\varvec{y}} - {\varvec{A}} {\varvec{h}}\Vert _2 \le \epsilon \end{array} \end{aligned}$$

yields \(\Vert {{\varvec{h}}} - {\varvec{h}}^* \Vert _2^2 \le c \epsilon ^2\) for some constant c depending on \(\delta _{2s}\) when the linear model \({\varvec{y}} = {\varvec{A}}{\varvec{h}}^* + {\varvec{v}}\), \(\Vert {\varvec{v}} \Vert _2 \le \epsilon\) holds, where \({\varvec{h}}^*\) is the solution of (14) [34, 35].

In the literature, in particular works on sparse polynomial regression [35] and Volterra series [36], several guarantees have been established for system matrices spawning from different alphabets and/or different distributions. For instance, in [24] results for Volterra system identification have been derived for signals drawn from \(\{-1,0,1\}\) and in [35] signals drawn from \(\mathcal {U}[-1,1]\). However, the bipolar case, e.g., \(\{-1,1\}\), has not been considered and its treatment within the self-driven Volterra expansion is still missing. Therefore, in the following, we present a RIP result for the second-order AGVM, whose technical proof is detailed in Appendix.

Theorem 4

Let \(\{x_i(t)\}_{i=1}^{N}\) for \(t\in [1,2,\ldots ,T]\) be an input sequence of independent random variables drawn from the alphabet \(\{-1,1\}\) with equal probability. Assume that the AGVM regression matrix is defined as

$$\begin{aligned} \tilde{{\varvec{X}}}^\top = \frac{1}{\sqrt{T}}[{\varvec{X}}^l\,{\varvec{X}}^b]^\top \in \mathbb {R}^{T\times L}, \end{aligned}$$

where \(L = N+N(N-1)/2\), \({\varvec{X}}^l := {\varvec{X}}^{(1)}\) and \({\varvec{X}}^b\) is \({\varvec{X}}^{(2)}\) with the quadratic terms removed, i.e., it only contains bilinear terms \(x_i(t)x_j(t),\,i\ne j\). Then, for any \(\delta _s\in (0,1)\) and for any \(\gamma \in (0,1)\), whenever \(T \ge \frac{4C}{(1-\gamma )\delta _s^2}s^2\log N\), the matrix \(\tilde{{\varvec{X}}}^\top\) possesses RIP \(\delta _s\) with probability exceeding \(1 - \exp \bigg (-\frac{\gamma \delta _s^2}{C}\cdot \frac{T}{s^2}\bigg )\), where \(C = 2\).

Notice that the pure quadratic terms in \({\varvec{X}}^{(2)}\) have been removed. This is due to the fact that for the bipolar signal case, these quadratic terms are constant when the alphabet is \(\{-1,1\}\) and equivalent to \({\varvec{X}}^{(1)}\) when the alphabet is \(\{0,1\}\). Hence, in both cases its contribution can be omitted without loss of generality. Furthermore, the data matrices are normalized concerning for the number of available measurements, i.e., T. This is done in order to guarantee that the diagonal entries of the Grammian of \(\tilde{{\varvec{X}}}^\top\) are unity in expectation.

This theorem asserts that \(T \in {{\mathcal {O}}}(s^2\log N)\) observations suffice to recover an s-sparse vector with graph Volterra kernels. Since it is considered that the number of unknowns per row of \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) is at most \({{\mathcal {O}}}(N)\) [cf. 1-2], the bound on the sampling complexity scales as \({{\mathcal {O}}}(N^2\log N)\) which agrees with bounds obtained for linear filtering setups [24, 37]; however, in this paper, the constant C is relatively small.

Given that under the established conditions, the proposed AGVM model is identifiable and is able to leverage sparsity to relieve its sampling complexity, in the following section, we present task-specific constraints for higher-order link inference and methods for estimating the graph Volterra kernels.

5 Real data applications

With the identifiability guarantees at hand, this section investigates how various learning tasks can benefit from the proposed AGVM. Specifically, two tailored AGVM algorithms for different applications namely topology identification in power systems and link prediction in social networks are presented.

5.1 Topology identification in distribution grids

The vertex set \(\mathcal {V}\) of a graph in a distribution grid comprises the indices of the nodal buses, while the edge set \(\mathcal {E}\) collects all the power distribution lines. The distribution grid is supposed to be a radial structure and thus the vertex set \({\mathcal {V}}:= \{0,\mathcal {N}\}\) has a root (substation) bus indexed by \(n = 0\). Every non-root bus \(n \in \mathcal {N}= \{ 1, \ldots , N\}\) has a unique parent bus \(\pi _n\). Naturally, the number of non-root buses is equal to the number of power lines in a radial network, that is, \(|\mathcal {N}| =|\mathcal {E}|= N\)Footnote 2.

In order to reveal both the edge connections as well as their higher-order interactions in power grids, we need to analyze the dependency of the signals on buses. The signal \(x_n(t)\) here is the squared voltage magnitude of bus \(n \in {\mathcal {N}}\) at time t. Based on our previous work [38], the voltage relationship among bus n and its children buses is given by

$$\begin{aligned} x_n(t) =&\sum _{i } h_n^{(1)}(i) x_{i} (t)+ \sum _{i}\;\sum _{j} h_n^{(2)}(i,j) x_i(t) x_j(t)+ \epsilon _n(t), \, n \in {\mathcal {N}} \end{aligned}$$

where \(i \in \{k: k \in \mathcal {N}, \pi _k = n\}\) and \(j\in \{k : k \in \mathcal {N}, \pi _k = n, k \ge i\}\); and \(h_n^{(2)}(i,j)\) are the first- and second-order expansion coefficients relating bus n with the sets \(\{i\}\) and \(\{i,j\}\), respectively; \(\epsilon _n\) comprises the modeling error and measurement noise on bus n. Collecting the data into \(\{{\varvec{x}}(t)\}_{t=1}^{T}\), and stacking the first- and second-order coefficients \(h_n^{(1)}(i)\) and \(h_n^{(2)}(i,j)\) into the vectors \({\varvec{h}}_n^{(1)}\) and \({\varvec{h}}_n^{(2)}\) in a lexicographic order. Similar to the derivation from (9) to (12), we can get the voltage on all buses (15) in a compact form as

$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}+ {\varvec{E}}\end{aligned}$$

where the t-th columns of \({\varvec{X}}^{(1)}\) and \({\varvec{X}}^{(2)}\) are \({\varvec{x}}(t)\) and \({\varvec{x}}(t) \boxtimes {\varvec{x}}(t)\) respectively; the n-th rows of \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) are \(({\varvec{h}}_n^{(1)})^\top\) and \(({\varvec{h}}_n^{(2)})^\top\), respectively.

To estimate the coefficients of model (16), the following assumptions are needed.

  1. A. 3

    There is no self-interaction on a bus; therefore, the matrix \({\varvec{H}}^{(1)}\) is a hollow matrix, i.e., \(h_n^{(1)}(n) = 0, \,\;\forall n \in \mathcal {N}\).

  2. A. 4

    There is no second-order interactions between two buses, that is to say, the coefficients for the second-order interactions satisfy \(h_n^{(2)}(j,k) = 0\), if \(n = j\), or \(j = k\), or \(n = k\) holds.

  3. A. 5

    The second-order interactions only exist among two buses connected through a central bus. Thus, the graph Volterra coefficients obey \(h_n^{(2)}(j,k) = 0\), if there exists \(h_n^{(1)}(l) = 0,\, \forall \,l\in \{j,k\}\).

Assumptions A. 3 and A. 4 both entail linear constraints, and thus can be easily fit in an optimization problem. To cope with the conditional constraint in A. 5, we call for the auxiliary matrix

$$\begin{aligned} {\varvec{H}}_n := \begin{bmatrix} h_n^{(1)}(1) &{} h_n^{(2)}(1,1) &{} \cdots &{} h_n^{(2)}(1,N) \\ h_n^{(1)}(2) &{} h_n^{(2)}(2,1) &{} \cdots &{} h_n^{(2)} (2,N)\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ h_n^{(1)}(N) &{} h_n^{(2)}(N,1) &{} \cdots &{} h_n^{(2)}(N,N) \\ \end{bmatrix} \end{aligned}$$

where the first column equals \({\varvec{h}}_n^{(1)}\). To guarantee that if \(h_n^{(1)}(i) = 0\), then \(h_n^{(2)}(i,j)= 0\), we enforce row sparsity in n by adding using \(\ell _{2,1}\)-regularization on \({\varvec{H}}_n^\top\), \(\forall n \in {\mathcal {N}}\). Based on the sparsity result from Sect. 4, we can estimate the expansion coefficients by the following sparsity-aware \(\ell _{2,1}\)-regularized least-squares

$$\begin{aligned} \underset{\{{\varvec{\theta }}_n\}_{n=1}^{N}}{\min }&~~ \sum \limits _{n\in {\mathcal {N}}}\Vert {\varvec{x}}_n - {\textbf{M}}^\top {\varvec{\theta }}_n \Vert _2^2 + \lambda \Vert {\varvec{\theta }}_n \Vert _1 + \mu \Vert {\varvec{H}}_n^\top \Vert _{2,1} \end{aligned}$$
$$\begin{aligned} \text {s. to} ~~&~~~ [{\varvec{\theta }}_1, {\varvec{\theta }}_2, \ldots , {\varvec{\theta }}_N]^\top \in \mathcal {X}_{h} . \end{aligned}$$


$$\begin{aligned} {\varvec{M}} = \begin{bmatrix} {\varvec{X}}^{(1)}\\ {\varvec{X}}^{(2)}\end{bmatrix}, \,\,\text {and } {\varvec{\theta }}_n = \begin{bmatrix} {\varvec{h}}_n^{(1)} \\ {\varvec{h}}_n^{(2)}\end{bmatrix} \end{aligned}$$

The set \(\mathcal {X}_{h}\) is convex and signifies the constraints characterized by Assumptions A. 3 and A. 4. The convex optimization problem (18) can be efficiently solved (and distributed) by off-the-shelf convex programming toolboxes.

Fig. 1
figure 1

ROC curves for topology inference of the SCE 47-bus distribution grid

To identify the underlying topology, the Southern California Edison (SCE) 47-bus distribution grid [3, 39] using real consumption and solar generation data from the Smart\({^*}\) project [40] was employed. Feeding this data to AC power flow equations [41, 42], we can obtain the voltage squared magnitude measurements \(\{{\varvec{x}}(t)\}_{t=1}^{T}\) across \(T = 240\) time slots. The voltage magnitudes of the substation bus satisfies \(v_0(t) = 1,\forall t\in \{1,\ldots ,T\}\). The radial grid comprises 41 buses where the interactions need to be inferred, after ignoring the root bus and the buses connected to their parent buses with zero-impedance lines. With \(\{{\varvec{x}}(t)\}_{t=1}^{T}\), the graph Volterra kernels were estimated by solving (18) and constructing \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) as in (16). While removing non-significant entries by a point-wise thresholding operation, the grid topology was inferred from the support of \({\varvec{H}}^{(1)}\) and the higher-order interactions were retrieved from \({\varvec{H}}^{(2)}\).

To assess the performance of the proposed AGVM approach (18), we have simulated three edge connectivity recovery methods, namely: i) multi-kernel based partial correlations (MKPC)-scheme [22]; ii) linear PC-scheme [43]; and iii) concentration matrix-based scheme [44]. The results were measured by the empirical receiver operating characteristic (ROC) curves and the area under the curve (AUC) values. Figure 1 depicts the ROC curves of all methods, while the AUC for AGVM, MKPC, linear PC, and concentration matrix are 0.9483, 0.9008, 0.8836, and 0.8052, respectively. The results showcase the proposed scheme outperforms all competing alternatives by exploiting the nonlinear interactions.

Fig. 2
figure 2

ROC curves for topology inference of the IEEE 123-bus distribution grid

The second experiment entailed the IEEE 123-bus feeder to examine the scalability and performance of the algorithm in topology identification. Voltage squared magnitude measurements \(\{{\varvec{x}}(t)\}_{t=1}^{T}\) across \(T = 400\) time slots were used. The ROC curves of the AGVM, MKPC, linear PC, and concentration matrix method are shown in Fig. 2. The AUC values for the AGVM, MKPC, linear PC, and concentration matrix method are 0.8935, 0.8024, 0.6952, and 0.4963, respectively. Evidently, these results corroborate the effectiveness of our proposed algorithm.

5.2 Link prediction in social networks

Another important domain inspiring higher-order interactions is link prediction in social and other biological networks. The goal of link prediction is to find the most likely subsets of vertices that will interact in the near future based on available observations of the activation of different nodes, e.g., song releases, email exchanges, and paper publications. Specifically, given a set of binary measurements \({\varvec{X}} \in \{0,1\}^{N\times T}\) at time slots \(t = \{1,\ldots ,T\}\), one needs to predict what is the most likely set of nodes to be activated together at any \(t' > T\).

In this subsection, we are considering the problem of predicting the closure of triangles, i.e., triplets of nodes that activate at the same time. Therefore, AGVMs can be restricted to order \(P=2\). Further, using the binary input data assumption, we can regard the interaction between variables as its joint activation and assume the function g in (7) is the product operation. As a result, a direct instantiation of an AGVM produces a model that constructs real-valued signals. Instead of directly modeling \(x_i(t)\), we borrow an idea from binary regression methods and use a latent variable \(z_i(t)\) to model the probability \(P(x_i(t)=1|z_i(t))\) as \(P(x_i(t)=1|z_i(t)) = \sigma (z_i(t))\), where \(\sigma (\cdot )\) represents the sigmoid function. The latent variable \(z_i(t)\) is then modeled as

$$\begin{aligned} \begin{aligned} z_i(t) = h^{(i)}_{o} + ({\varvec{(}}h_n^{(1)})^\top {\varvec{x}}(t) + ({\varvec{(}}h_n^{(2)})^\top ({\varvec{x}}(t) \boxtimes {\varvec{x}}(t)). \end{aligned} \end{aligned}$$

Gathering the latent variables for nodes through time slots, the AGVM (12) for the link prediction task becomes

$$\begin{aligned} {\varvec{Z}} = {\varvec{H}}^{(0)}{\varvec{1}}^\top + {\varvec{H}}^{(1)}{\varvec{X}}^{(1)} + {\varvec{H}}^{(2)}{\varvec{X}}^{(2)} ={{\varvec{\Theta }}^\top } {{\varvec{M}}} \end{aligned}$$

where \([{\varvec{Z}}]_{i,t} = z_i(t)\); \({{\varvec{\Theta }}} = [{\varvec{H}}^{(0)}\,{\varvec{H}}^{(1)}\,{\varvec{H}}^{(2)}]\); and \({\varvec{M}} = [{\varvec{1}}\, ({\varvec{X}}^{(1)})^\top \, ({\varvec{X}}^{(2)})^\top ]^\top\). Different from traditional logistic regression, the binary labels in (20) are the node variables themselves.

figure a

The goal of closure prediction is to find the most likely sets of nodes, which form an open structure that will become close. Here, an open structure refers to a set of nodes, \({\mathcal {A}}\), that have interacted with each other, but have not appeared simultaneously on a single simplex set, whose elements are the indexes of the nonzero elements of \({\varvec{x}}(t)\) [45]. Therefore, based on the analysis in our previous work [45], we have the following conclusions. Observing the support of off-diagonal entries of \({\varvec{W}} = {\varvec{X}}^{(1)} ({\varvec{X}}^{(1)})^\top\), we can obtain an initial network connectivity. From this connectivity, and enforcing \(h_i^{(2)}(i,i) = 0, \forall i \in \mathcal {V}\), the set of open triangles \(\mathcal {T}_O\), closed triangles \(\mathcal {T}\), and the candidates for the nonzero graph Volterra coefficients \(\mathcal {S}^{(2)} : =\{\mathcal {S}_{i}^{(2)}\}_{i=1}^{N}\) (cf. (3)) can be obtained. Upon \(\mathcal {S}^{(2)}\), it holds \(\textrm{vec}({{\varvec{\Theta }}}) = {\varvec{B}} {{\varvec{\bar{\theta} }}}\), where \(\bar{{\varvec{\theta }}}\) captures the nonzero kernels, and \({\varvec{B}}\) is an expansive binary matrix that relates the nonzero entries of the graph Volterra kernels in \(\textrm{vec}({{\varvec{\Theta }}})\). These nonzero Volterra kernels are defined by the support obtained from \({\varvec{W}}\). Noticing that we know the nonzero positions of \(\textrm{vec}({{\varvec{\Theta }}})\), and the dimensions of \(\textrm{vec}({{\varvec{\Theta }}})\) and \({\varvec{{\varvec{\theta }}}}\), we can build this \({\varvec{B}}\) matrix based on the initial network connectivity. To estimate the parameter \(\bar{{\varvec{\theta }}}\), we propose a proximal gradient ascent algorithm with sparsity regularization, which is summarized in Alg. 1. Notice that get_mtx_row takes the row of the input which is related to the latent variable \(z_i(t)\); soft_thr(\(\cdot ,\eta \lambda\)) entails soft threshold with respect to \(\eta \lambda\). We assume that open triangles with large coefficients, which means a high level of interaction, are the most likely triangles to become closed. After obtaining \(\bar{{\varvec{\theta }}}\), we sort the entries related to \(\mathcal {T}_O\) by their absolute value, and the top K entries are then the K most likely open triangles to become closed.

Remark 2̇

The most expensive step in Alg. 1 is updating the gradient, which takes an effort of \({{\mathcal {O}}}(NTd)\), where d is the dimension of \({ \bar{{\varvec{\theta }}}}_k\). Thus, the complexity per iteration is \({{\mathcal {O}}}(NTd)\) as the rest of the operations incur at most a \({{\mathcal {O}}}(d)\) complexity. If we consider the worst case, that is, the algorithm runs until it hits its maximum number of iterations, \(k_{\textrm{max}}\), the overall worst-case complexity of the algorithm is then \({{\mathcal {O}}}(NTdk_{\textrm{max}})\).

Fig. 3
figure 3

AUC values for different models on dataset aEnron_email” and bprimary_school_contact”, namely harmonic, geometric, and arithmetic means of the three edge weights in the open triangle, Adamic-Adar model, preferential attachment model, dyadic link prediction Katz and personalized PageRank (PPR), Common Neighbors, and Jaccard; see [13] (and references therein) for more details. When applicable, the subscripts w, u, and 3w stand for weighted, unweighted and 3-way, respectively

To examine the effectiveness of Alg. 1, the third experiment entailed the “Enron_email” [46] and “primary_school_contact” [47] datasets as well as several alternatives in [13] in terms of open triangle closure prediction. In the “Enron_email” dataset, the nodes denote the email addresses of different Enron employees, while in the “primary_school_contact” dataset, nodes are proximity-based contacts recorded by wearable sensors in a primary school. The training set contains the first \(10\%\) and \(1\%\) of timestamped events in the “Enron_email” and “primary_school_contact” datasets, respectively, while the testing set includes the remaining data. The proposed algorithm employs \(\lambda =10^{-3}\), \(\eta =10^{-4}\) and \(k_{\text {max}}=500\) for both experiments. The AUC metric on the first 100 nodes of the datasets for all methods is shown in Fig. 3. The curves showcase the effectiveness of the proposed model along with the logistic regression of Alg. 1 compared with recently proposed methods based on generalizing the link prediction scores for the task of triangle closure prediction.

6 Conclusions

This paper proposes a principled manner to identify and predict the higher-order interactions in networked data. Borrowing ideas from SEMs and Volterra models, a node signal in the network is modeled as a combination of its neighbor signals and a nonlinear combination of the signals in the groups (higher-order links) it belongs to. Some identifiability guarantees of the proposed second-order AGVM are then provided under three conditions that input/output data exhibit. Our model provides both expressibility for higher-order interactions, as well as interpretability for further understanding of the underlying network dynamics. Moreover, the proposed AGVM is particularized to handle two different applications, which are, topology identification in power systems and link prediction in social networks. The merits of the proposed algorithms relative to existing methods are corroborated through numerical tests using real data. This work also opens up interesting directions for future research, including avoiding the computational burden for large-scale networks as well as generalizations of the higher-order model to other challenging applications.

Availability of data and materials

All data generated or analyzed during this study are included in this paper.


  1. With a slight abuse of notation, the notation after h enclosed by parentheses indicates the node index.

  2. With a slight abuse of notation, we use N here to denote the number of non-root buses and hence the number of nodes in the network is \(N+1\).



Autoregressive graph Volterra model


Structural equation model


Restricted isometry property


Proximal gradient ascend


Multi-kernel-based partial correlations


Receiver operating characteristic


Area under the curve


Personalized PageRank similarity


  1. D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Tec. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  2. S. Golshannavaz, S. Afsharnia, F. Aminifar, Smart distribution grid: optimal day-ahead scheduling with reconfigurable topology. IEEE Trans. Smart Grid 5(5), 2402–2411 (2014)

    Article  Google Scholar 

  3. Q. Yang, G. Wang, A. Sadeghi, G.B. Giannakis, J. Sun, Two-timescale voltage control in distribution grids using deep reinforcement learning. IEEE Trans. Smart Grid 11(3), 2313–23 (2020)

    Article  Google Scholar 

  4. S. Sulaimany, M. Khansari, A. Masoudi-Nejad, Link prediction potentials for biological networks. Int. J. Data Min. Bioinf. 20(2), 161–184 (2018)

    Article  Google Scholar 

  5. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon, Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)

    Article  Google Scholar 

  6. S. Barbarossa, S. Sardellitti, Topological signal processing over simplicial complexes. IEEE Trans. Signal Process. 68, 2992–3007 (2020)

    Article  MATH  Google Scholar 

  7. P. Frankl, V. Rödl, Extremal problems on set systems. Random Struct. Algorithm 20(2), 131–164 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. S. Sardellitti, S. Barbarossa, L. Testa, Topological signal processing over cell complexes, in IEEE Asilomar Conf. Signals, Systems, Computers, pp. 1558–1562 (2021)

  9. M.T. Schaub, A.R. Benson, P. Horn, G. Lippner, A. Jadbabaie, Random walks on simplicial complexes and the normalized hodge laplacian. arXiv:1807.05044 (2018)

  10. S. Zhang, Z. Ding, S. Cui, Introducing hypergraph signal processing: theoretical foundation and practical applications. IEEE Internet Things J. 7(1), 639–660 (2020)

    Article  Google Scholar 

  11. G.B. Giannakis, Y. Shen, G.V. Karanikolas, Topology identification and learning over graphs: accounting for nonlinearities and dynamics. Proc. IEEE 106(5), 787–807 (2018)

    Article  Google Scholar 

  12. M. Coutino, E. Isufi, T. Maehara, G. Leus, State-space network topology identification from partial observations. arXiv:1906.10471 (2019)

  13. A.R. Benson, R. Abebe, M.T. Schaub, A. Jadbabaie, J. Kleinberg, Simplicial closure and higher-order link prediction. Proc. Natl. Acad. Sci. 115(48), 11221–11230 (2018)

    Article  Google Scholar 

  14. L. Lim, Hodge laplacians on graphs. arXiv:1507.05379 (2015)

  15. S. Ebli, M. Defferrard, G. Spreemann, Simplicial neural networks. arXiv:2010.03633 (2020)

  16. L. Giusti, C. Battiloro, P. Di Lorenzo, S. Sardellitti, S. Barbarossa, Simplicial attention networks. Preprint arXiv:2203.07485 (2022)

  17. M. Yang, E. Isufi, G. Leus, Simplicial convolutional neural networks, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2022), pp. 8847–8851

  18. J.J. Hox, T.M. Bechger, An introduction to structural equation modeling. Family Sci. Rev. 11, 354–373 (1998)

    Google Scholar 

  19. H. Lütkepohl, Vector Autoregressive Models (Springer, Berlin, 2011)

    Book  MATH  Google Scholar 

  20. G.V. Karanikolas, O. Sporns, G.B. Giannakis, Multi-kernel change detection for dynamic functional connectivity graphs, in Asilomar Conf. on Signals, Syst., and Comput., Pacific Grove, CA, USA pp. 1555–1559 (2017)

  21. E. Isufi, A. Loukas, N. Perraudin, G. Leus, Forecasting time series with VARMA recursions on graphs. IEEE Trans. Signal Process. 67(18), 4870–4885 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  22. L. Zhang, G. Wang, G.B. Giannakis, Going beyond linear dependencies to unveil connectivity of meshed grids, in Proc. of CAMSAP, Curacao, AN, pp. 1–5 (2017)

  23. D. Song, R.H. Chan, V.Z. Marmarelis, R.E. Hampson, S.A. Deadwyler, T.W. Berger, Nonlinear dynamic modeling of spike train transformations for hippocampal-cortical prostheses. IEEE Trans. Biomed. Eng. 54(6), 1053–1066 (2007)

    Article  Google Scholar 

  24. V. Kekatos, G.B. Giannakis, Sparse volterra and polynomial regression models: recoverability and estimation. IEEE Trans. Signal Process. 59(12), 5907–5920 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. C. Krall, K. Witrisal, G. Leus, H. Koeppl, Minimum mean-square error equalization for second-order Volterra systems. IEEE Trans. Signal Process. 56(10), 4729–4737 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  26. V.Z. Marmarelis, Identification of nonlinear biological systems using Laguerre expansions of kernels. Ann. Biomed. Eng. 21(6), 573–589 (1993)

    Article  Google Scholar 

  27. H. Huang, J. Tang, L. Liu, J. Luo, X. Fu, Triadic closure pattern analysis and prediction in social networks. IEEE Trans. Knowl. Data Eng. 27(12), 3374–3389 (2015)

    Article  Google Scholar 

  28. M. Kivelä, A. Arenas, M. Barthelemy, J.P. Gleeson, Y. Moreno, M.A. Porter, Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)

    Article  Google Scholar 

  29. P. Prenter, A Weierstrass theorem for real, separable Hilbert spaces. J. Approxim. Theory 3(4), 341–351 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  30. S. Boyd, L. Chua, Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits syst. 32(11), 1150–1161 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  31. M. Schetzen, The volterra and wiener theories of nonlinear systems (1980)

  32. J.A. Bazerque, B. Baingana, G.B. Giannakis, Identifiability of sparse structural equation models for directed and cyclic networks, in IEEE Glob. Conf. Signal Inf. Process, pp. 839–842 (2013)

  33. J.B. Kruskal, Rank, decomposition, and uniqueness for 3-way and N-way arrays. Multiway Data Analysis, 7–18 (1989)

  34. E.J. Candes, The restricted isometry property and its implications for compressed sensing. C. R. Acad. Sci. Paris, Ser. I 346(9–10), 589–592 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  35. B. Nazer, R.D. Nowak, Sparse interactions: Identifying high-dimensional multilinear systems via compressed sensing, in Annu. Allert. Conf. Commun. Control Comput. Allert, pp. 1589–1596 (2010)

  36. V. Kekatos, D. Angelosante, G.B. Giannakis, Sparsity-aware estimation of nonlinear volterra kernels, in IEEE CAMSAP, pp. 129–132 (2009)

  37. J. Haupt, W.U. Bajwa, G. Raz, R. Nowak, Toeplitz compressed sensing matrices with applications to sparse channel estimation. IEEE Trans. Inf. Theory 56(11), 5862–5875 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  38. Q. Yang, M. Coutino, G. Wang, G.B. Giannakis, G. Leus, Learning connectivity and higher-order interactions in radial distribution grids, in IEEE Int. Conf. Acoust. Speech Signal Process, pp. 5555–5559 (2020)

  39. M. Farivar, C.R. Clarke, S.H. Low, K.M. Chandy, Inverter VAR control for distribution systems with renewables, in Proc. of IEEE SmartGridComm., Brussels, Belgium, pp. 457–462 (2011)

  40. S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, J. Albrecht, Smart*: an open data set and tools for enabling research in sustainable homes. SustKDD 111(112), 108 (2012)

    Google Scholar 

  41. M. Baran, F.F. Wu, Optimal sizing of capacitors placed on a radial distribution system. IEEE Trans. Power Del. 4(1), 735–743 (1989)

    Article  Google Scholar 

  42. S.H. Low, Convex relaxation of optimal power flow–Part II: exactness. IEEE Trans. Control Netw. Syst. 1(2), 177–189 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  43. S. Bolognani, N. Bof, D. Michelotti, R. Muraro, L. Schenato, Identification of power distribution network topology via voltage correlation analysis, in Proc. of CDC, Florence, ITL, pp. 1659–1664 (2013)

  44. D. Deka, S. Talukdar, M. Chertkov, M. Salapaka, Topology estimation in bulk power grids: Guarantees on exact recovery. arXiv:1707.01596 (2017)

  45. M. Coutino, G.V. Karanikolas, G. Leus, G.B. Giannakis, Self-driven graph volterra models for higher-order link prediction, in IEEE Int. Conf. Acoust. Speech Signal Process, pp. 3887–3891 (2020)

  46. B. Klimt, Y. Yang, The enron corpus: A new dataset for email classification research, in ECOML (Spinger, 2004), pp. 217–226

  47. J. Stehlé, N. Voirin, A. Barrat, C. Cattuto, L. Isella, J.-F. Pinton, M. Quaggiotto, W. Van den Broeck, C. Régis, B. Lina et al., High-resolution measurements of face-to-face contact patterns in a primary school. PloS one 6(8), 23176 (2011)

    Article  Google Scholar 

  48. R. Marsli, F.J. Hall, Geometric multiplicities and geršgorin discs. Am. Math. Mon. 120(5), 452–455 (2013)

    Article  MATH  Google Scholar 

Download references


Not applicable.


This work was supported in part by ASPIRE project 14926 (within the STW OTP program) financed by the Netherlands Organization for Scientific Research (NWO), NSF Grants 1509040, 1711471, and 1901134.

Author information

Authors and Affiliations



All authors contributed to this research, including the design of the simulations and analyses of the results. GL proposed the research direction. QLY and MC conceived the system model. MC completed the identifiability analysis of the paper. QLY analyzed the data and wrote this manuscript. GBG proofread the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Geert Leus.

Ethics declarations

Ethics approval and consent to participate

All procedures performed in this paper were in accordance with the ethical standards of research community.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Proof of Theorem 1

Let us consider the expansion

$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}+ {\varvec{\Gamma }}{\varvec{Y}}. \end{aligned}$$

Rewriting the linear terms, i.e., terms related with \({\varvec{X}}^{(1)}\), we obtain the system

$$\begin{aligned} \mathcal {H} {\mathcal {X}}&= {\varvec{\Gamma }}{\varvec{Y}}\end{aligned}$$

where \({\mathcal {X}}:= [({\varvec{X}}^{(1)})^\top \, ({\varvec{X}}^{(2)})^\top ]^\top\) and \(\mathcal {H} := [({\varvec{I}}- {\varvec{H}}^{(1)})\; |\; -{\varvec{H}}^{(2)}]\).

Due to hypothesis that \({\mathcal {X}}\) is full row rank, the unique least-squares solution for the kernel matrices is obtained by \(\mathcal {H}_{\textrm{LS}} := {\varvec{\Gamma }}{\varvec{Y}}{\mathcal {X}}^{\dagger }\). Defining \({\varvec{Q}} := {\varvec{Y}}{\mathcal {X}}^{\dagger }\) and partitioning this matrix appropriately, i.e., \({\varvec{Q}} = [{\varvec{Q}}_1 \in \mathbb {R}^{N\times N}\; |\; {\varvec{Q}}_2 \in \mathbb {R}^{N \times M}]\),

we obtain the following relations

$$\begin{aligned} {\varvec{I}}- {\varvec{H}}^{(1)}&= {\varvec{\Gamma }}{\varvec{Q}}_1\\ -{\varvec{H}}^{(2)}&= {\varvec{\Gamma }}{\varvec{Q}}_2. \end{aligned}$$

Now, let us recall that \({\varvec{\Gamma }}\) is a diagonal matrix and that \({\varvec{H}}^{(1)}\) is a hollow matrix, i.e., its diagonal is filled with zeros. Thus, it holds \(\textrm{diag}({\varvec{\Gamma }}{\varvec{Q}}_1) = {\varvec{1}}\),

which implies \(\textrm{diag} ({\varvec{\Gamma }}_{\textrm{LS}}) := \textrm{diag}({\varvec{Q}}_1)^{-1}\).

Finally, the estimates for the kernel matrices are given as

$$\begin{aligned} {\varvec{H}}^{(1)}_{\textrm{LS}}&:= {\varvec{I}}- {\varvec{\Gamma }}_{\textrm{LS}}{\varvec{Q}}_{1} \\ {\varvec{H}}^{(2)}_{\textrm{LS}}&:= -{\varvec{\Gamma }}_{\textrm{LS}}{\varvec{Q}}_2. \end{aligned}$$

The proof is completed. \(\square\)

Proof of Theorem 2

First, let us rewrite the expansion as

$$\begin{aligned} ({\varvec{X}}^{(1)})^\top&= [({\varvec{X}}^{(1)})^\top \, ({\varvec{X}}^{(2)})^\top ]\begin{bmatrix} ({\varvec{H}}^{(1)})^\top \\ ({\varvec{H}}^{(2)})^\top \end{bmatrix} \\&= \mathcal {X}^\top \mathcal {H}^\top . \end{aligned}$$

Now, consider the ith column of both sides of the expression above, i.e., \([({\varvec{X}}^{(1)})^\top ]_i = \mathcal {X}^\top [\mathcal {H}^\top ]_i\).

Suppose there exists a vector \({\varvec{h}}_i\), \({\varvec{h}}_i \ne [\mathcal {H}^\top ]_i\), satisfying the same relation and with \(K = K_1 + K_2\) nonzero entries. This implies that \({\varvec{0}} = \mathcal {X}^\top ([\mathcal {H}^\top ]_i - {\varvec{h}}_i)\)

must hold. As \([\mathcal {H}]^\top _i\) and \({\varvec{h}}_i\) have at most K entries, their difference has at most 2K nonzero entries. Hence, if \(\textrm{kr}(\mathcal {X}^\top ) \ge 2K\), any possible subset of columns of \(\mathcal {X}^\top\) are linearly independent. Thus, \(([\mathcal {H}^\top ]_i - {\varvec{h}}_i) = {\varvec{0}}\) holds. This contradicts the assumption, hence the result of the theorem holds. \(\square\)

Proof of Theorem 3

Let us consider the expansion

$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}+ {\varvec{\Gamma }}{\varvec{Y}}. \end{aligned}$$

Notice that the jth column of \({\varvec{X}}^{(2)}\) is given by \({{\varvec{C}}}\big ([{\varvec{X}}^{(1)}]_j\otimes [{\varvec{X}}^{(1)}]_j\big )\) where \([{\varvec{X}}^{(1)}]_j\) is the jth column of \({\varvec{X}}^{(1)}\), \(\otimes\) is the Kronecker product. and \({\varvec{C}}\) is a binary selection matrix picking the appropriate rows of the Kronecker product. Hence, we can express \({\varvec{X}}^{(2)}\) using the Khatri-Rao product \(*\), and \({\varvec{X}}^{(1)}\) as

$$\begin{aligned} {\varvec{X}}^{(2)}&= {{\varvec{C}}}\bigg [[{\varvec{X}}^{(1)}]_1\otimes [{\varvec{X}}^{(1)}]_1 \; [{\varvec{X}}^{(1)}]_2\otimes [{\varvec{X}}^{(1)}]_2 \; \cdots \bigg ] \\&={{\varvec{C}}} ({\varvec{X}}^{(1)}*{\varvec{X}}^{(1)}). \end{aligned}$$

By hypothesis, we have that \({\varvec{X}}^{(1)}{\varvec{\Pi }}_1 = {\varvec{0}}\). Thus, by right multiplying the expansion by \({\varvec{\Pi }}_1\) and reorganizing terms, we obtain

$$\begin{aligned} -{\varvec{H}}^{(2)}{\varvec{X}}^{(2)}{\varvec{\Pi }}_1&= {\varvec{\Gamma }}\tilde{{\varvec{Y}}} \\ -{\varvec{H}}^{(2)}{{\varvec{C}}}({\varvec{X}}^{(1)}*{\varvec{X}}^{(1)}){\varvec{\Pi }}_1&= {\varvec{\Gamma }}\tilde{{\varvec{Y}}} \end{aligned}$$

where \(\tilde{{\varvec{Y}}}:={\varvec{Y}}{\varvec{\Pi }}_1 \ne {\varvec{0}}\) (by assumption) and the identity \({\varvec{X}}^{(2)}= {{\varvec{C}}}({\varvec{X}}^{(1)}*{\varvec{X}}^{(1)})\) has been used. Now, let us consider the ith equation of the above relation, i.e.,

$$\begin{aligned} - (\tilde{{\varvec{X}}}^{(2)})^\top {\varvec{h}}_i^{(2)} = [{\varvec{\Gamma }}]_{ii}\tilde{{\varvec{y}}}_i \end{aligned}$$

where \({\tilde{{\varvec{X}}}}^{(2)} := {\varvec{X}}^{(2)} {\varvec{\Pi }}_1\); \({\varvec{h}}_i^{(2)}\) and \(\tilde{{\varvec{y}}}_i\) are the ith rows of \({\varvec{H}}^{(2)}\) and \(\tilde{{\varvec{Y}}}\) in column-vector form, respectively. As \(\textrm{kr}({{\tilde{{\varvec{X}}}}^{(2)}}) \ge 2K_2 + 1\) and the number of nonzero elements per row in \({\varvec{H}}^{(2)}\) is bounded above by \(K_2\) by hypothesis, the above system has a unique solution (up to an scalar ambiguity that does not affect the support) with such a sparsity level, i.e., sparsest solution, \(\forall \; i\). \(\square\)

Proof of Theorem 4

Let us consider a particular realization of \(\tilde{{\varvec{X}}}^\top := [\frac{1}{\sqrt{T}}{\varvec{X}}^{l}\,\frac{1}{\sqrt{T}}{\varvec{X}}^b]^\top = [\tilde{{\varvec{X}}}^l\, \tilde{{\varvec{X}}}^b]\in \{-1,1\}^{T\times L}\) and its Grammian \(\tilde{{\varvec{G}}}:= \tilde{{\varvec{X}}}\tilde{{\varvec{X}}}^\top\). By the Gersgorin disc theorem [48], if \(|[\tilde{{\varvec{G}}}]_{ii} - 1| < \delta _d\) and \(|[\tilde{{\varvec{G}}}]_{ij}| \le \frac{\delta _o}{s}\) for every ij with \(j\ne i\) and \(\delta _d + \delta _o = \delta\) for some \(\delta \in (0,1)\), then \(\tilde{{\varvec{X}}}\) possesses RIP \(\delta _s \le \delta\). Therefore, we can upper bound the probability of \(\tilde{{\varvec{X}}}\) not satisfying RIP of value \(\delta\), \(\textrm{Pr}(\delta _s > \delta )\), as

$$\begin{aligned} \textrm{Pr}\bigg ( \bigcup _{i=1}^{L}\big \{|[{{\varvec{\tilde{G}}}}]_{ii} - 1| \ge \delta _d\big \}\; \textrm{or}\; \bigcup _{i=1}^{L}\bigcup _{j=1,j\ne i}^{L}\bigg \{|[{{\varvec{\tilde{G}}}}]_{ij}|\ge \frac{\delta _o}{s}\bigg \} \bigg ). \end{aligned}$$

As \({{\varvec{\tilde{G}}}}\) is symmetric, we can use the union bound only for its unique entries to upper bound \(\textrm{Pr}(\delta _s > \delta )\) as

$$\begin{aligned} \sum \limits _{i=1}^{L}\textrm{Pr}\bigg (|[{{\varvec{\tilde{G}}}}]_{ii} - 1| \ge \delta _d \bigg ) + \sum \limits _{i=1}^{L}\sum \limits _{j = i+1}^{L}\textrm{Pr}\big ( |[{{\varvec{\tilde{G}}}}]_{ij}| \ge \frac{\delta _o}{s} \big ). \end{aligned}$$

To show the result of the theorem, we proceed next to bound the probabilities above. The analysis of these probabilities is similar to the one in [24]. However, here we obtain results for a different distribution and for linear and bilinear components. To simplify the notation, we introduce the following partition for the Grammian matrix, i.e.,

$$\begin{aligned} {{\varvec{\tilde{G}}}}= \begin{bmatrix} {{\varvec{\tilde{G}}}}^{ll}&{} {{\varvec{\tilde{G}}}}^{lb}\\ ({{\varvec{\tilde{G}}}}^{lb})^\top &{} {{\varvec{\tilde{G}}}}^{bb}\end{bmatrix} \end{aligned}$$

where \({{\varvec{\tilde{G}}}}^{ll}:= \tilde{{\varvec{X}}}^{l}(\tilde{{\varvec{X}}}^l)^\top\), \(\tilde{{\varvec{G}}}^{lb}:= \tilde{{\varvec{X}}}^l(\tilde{{\varvec{X}}}^b)^\top\) and \(\tilde{{\varvec{G}}}^{bb}:= \tilde{{\varvec{X}}}^b(\tilde{{\varvec{X}}}^b)^\top\).

Recalling that the raw moments for the inputs are given by

$$\begin{aligned} m_r = {\left\{ \begin{array}{ll} 0 &{} r \text { odd}\\ 1 &{} r \text { even} \end{array}\right. } \end{aligned}$$

we obtain the following relations

$$\begin{aligned}{}[\tilde{{\varvec{G}}}^{ll}]_{ii}&= 1,&\mathbb {E}\{[\tilde{{\varvec{G}}}^{ll}]_{ii}\}&= 1,&\mathbb {E}\{[\tilde{{\varvec{G}}}^{ll}]_{ij}\}&= 0\\ [\tilde{{\varvec{G}}}^{bb}]_{ii}&= 1,&\mathbb {E}\{[\tilde{{\varvec{G}}}^{bb}]_{ii}\}&= 1,&\mathbb {E}\{[\tilde{{\varvec{G}}}^{bb}]_{ij}\}&= 0 \end{aligned}$$

and \(\mathbb {E}\{[\tilde{{\varvec{G}}}^{lb}]_{ij}\} = 0\; \forall \; i,j.\) By a quick inspection, we notice that the terms of the first part of \(\textrm{Pr}(\delta _s > \delta )\) are identically zero, hence \(\delta _d = 0\).

To bound the required probabilities, we make use of the following Hoeffding’s inequality.

Lemma 1

(Hoeffding’s Inequality) Given \(t > 0\) and independent random variables \(\{x_i\}_{i=1}^N\) bounded as \(a_i \le x_i \le b_i\) almost surely, the sum \(s_N := \sum \nolimits _{i=1}^{N} x_i\) satisfies

$$\begin{aligned} \textrm{Pr}(|s_N - \mathbb {E}\{s_N\}| \ge t) \le 2\exp \bigg (-\frac{2t^2}{\sum _{i=1}^N(b_i-a_i)^2}\bigg ). \end{aligned}$$

Let us consider the off-diagonal elements of \({{\varvec{\tilde{G}}}}^{ll}\). The related probability can be bounded as

$$\begin{aligned} \textrm{Pr}\bigg ( |[{{\varvec{\tilde{G}}}}^{ll}]_{ij}|\ge \frac{\delta _o}{s} \bigg ) \le 2\exp \bigg (-\frac{\delta _o^2T}{2s^2}\bigg ) \end{aligned}$$

as we consider that these entries are the result of a sum of T independent variables contained in \(\{-1,1\}\). Similar bounds can be found for the other entries of \({{\varvec{\tilde{G}}}}\), i.e.,

$$\begin{aligned} \textrm{Pr}\bigg ( |[{{\varvec{\tilde{G}}}}^{bb}]_{ij} | \ge \frac{\delta }{s^2} \bigg ) \le 2\exp \bigg (-\frac{\delta _o^2T}{2s^2}\bigg ),\; \textrm{Pr}\bigg ( |[{{\varvec{\tilde{G}}}}^{lb}]_{ij} | \ge \frac{\delta }{s^2} \bigg ) \le 2\exp \bigg (-\frac{\delta _o^2T}{2s^2}\bigg ). \end{aligned}$$

Recollecting the probabilities for all the entries, we obtain

$$\begin{aligned} \textrm{Pr}(\delta _s > \delta ) :&= \sum \limits _{i=1}^{L}\sum \limits _{j\ge 1}^{L}\textrm{Pr}\bigg ( |[{{\varvec{\tilde{G}}}}^{ll}]_{ij}| \ge \frac{\delta _o}{s} \bigg ) + \sum \limits _{i=1}^{L}\sum \limits _{j\ge 1}^{L}\textrm{Pr}\bigg ( |[{{\varvec{\tilde{G}}}}^{bb}]_{ij}| \ge \frac{\delta _o}{s} \bigg ) \\&\quad + \sum \limits _{i=1}^{L}\sum \limits _{j\ge 1}^{L}\textrm{Pr}\bigg ( |[{{\varvec{\tilde{G}}}}^{lb}]_{ij}| \ge \frac{\delta _o}{s} \bigg ) \\&\le (N^2 + \frac{1}{8}N^4 + \frac{1}{2}N^3)\exp \bigg (-\frac{\delta _o^2T}{2s^2}\bigg ) \\&\le N^4\exp \bigg (-\frac{\delta _o^2T}{2s^2}\bigg ) \end{aligned}$$

for \(N > 2\). Considering \(\delta = \delta _o\) (as \(\delta _d = 0\)) and setting \(C=2\), for a \(\gamma \in (0,1)\) and \(T \ge \frac{4C}{(1-\gamma )\delta ^2}s^2\log N,\) we can simplify the above bound as

$$\begin{aligned} \textrm{Pr}\bigg ( \delta _s > \delta \bigg ) \le \exp \bigg ( - \frac{\gamma \delta ^2T}{Cs^2} \bigg ) \end{aligned}$$


Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Q., Coutino, M., Leus, G. et al. Autoregressive graph Volterra models and applications. EURASIP J. Adv. Signal Process. 2023, 4 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: