This section focuses on the conditions that the input/output data should exhibit in order to uniquely identify the secondorder AGVM model (12). Although asymptotic results have been obtained for sparse regression, i.e., the Lasso estimator, here we are more interested in the finitesample regime. Therefore, borrowing tools from the compressing sensing literature and linear algebra, we are able to provide recovery guarantees in both deterministic and probabilistic settings.
Without loss of generality, we present our following results considering that the zerothorder term \({\varvec{H}}^{(0)}\) is projected out and the noise term \({\varvec{E}}\) is not present or has been removed.
Our first result asserts identifiability of the network connections, represented through the graph Volterra kernels, highlighting the role of the exogenous data \({\varvec{Y}}\). Notice that this result is a generalization of the result in [32] obtained for resolving direction ambiguities in structural equation models applied for directed graphs.
Theorem 1
Suppose that data \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) and \({\varvec{Y}}\) abide to the secondorder AGVM model
$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}+ {\varvec{\Gamma }}{\varvec{Y}}\end{aligned}$$
for a matrix \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and diagonal matrix \({\varvec{\Gamma }}\) with diagonal entries\([{\varvec{\Gamma }}]_{ii} \ne 0\). If \({\mathcal {X}}:= [({\varvec{X}}^{(1)})^\top \;({\varvec{X}}^{(2)})^\top ]^\top\) is full row rank, then \({\varvec{H}}^{(1)}\), \({\varvec{H}}^{(2)}\) and \({\varvec{\Gamma }}\) are uniquely expressible in terms of \({\mathcal {X}}\) and \({\varvec{Y}}\) as
$$\begin{aligned} \textrm{diag}({\varvec{\Gamma }})&= \textrm{diag}({\varvec{Q}}_1)^{1} \\ {\varvec{H}}^{(1)}&= {\varvec{I}} {\varvec{\Gamma }}{\varvec{Q}}_1 \\ {\varvec{H}}^{(2)}&= {\varvec{\Gamma }}{\varvec{Q}}_2 \end{aligned}$$
where \({\varvec{Y}}{\mathcal {X}}^{\dagger } = [{\varvec{Q}}_1\in \mathbb {R}^{N\times N}\,{\varvec{Q}}_2\in \mathbb {R}^{N \times M}]\).
Proof
See Appendix. \(\square\)
This result exhibits the importance of the exogenous data (perturbation), \({\varvec{Y}}\), to uniquely identify the AGVM model. This shows, as in the classical SEM, that given a sufficiently rich perturbation, the directionality, as well as the higherorder interactions (triplets), can be uniquely determined from the measured data.
Although the result of Theorem 1 establishes the identifiability of the AGVM model, it requires a full row rank data matrix \(\mathcal {X}\), which in many cases might not be possible, i.e., the number of samples must be at least \({{\mathcal {O}}}(N^2)\). Thus, in order to improve the sampling complexity for the problem, prior information is required to constrain the model. A natural assumption, arising in many networkeddata applications, is the sparse interaction among the nodes. That is, the number of connections (edges) among nodes are much smaller than the size of the network, and therefore, the number of triads in which they participate are restricted. Before proceeding, we need the following sparsity assumptions.

A. 1
Each row of matrix \({\varvec{H}}^{(1)}\) has at most \(K_1\) nonzero entries, i.e., \(\Vert {\varvec{h}}_{i}^{(1)} \Vert _0 \le K_1\; \forall \; i\).

A. 2
Each row of matrix \({\varvec{H}}^{(2)}\) has at most \(K_2\) nonzero entries, i.e., \(\Vert {\varvec{h}}_{i}^{(2)} \Vert _0 \le K_2\; \forall \; i\).
Assumptions A.1 and A.2 on the graph Volterra kernels can be seen as restrictions on the number of edges and triangles existing in the graph. By letting \(K_1 \in {{\mathcal {O}}}(1)\) and \(K_2 \in {{\mathcal {O}}}(N)\), these assumptions translate into graph Volterra kernels that represent a sparse graph, i.e., \({{\mathcal {O}}}(N)\) edges and \({{\mathcal {O}}}(N^2)\) triangles. In addition, as shown in the following, the sparsity assumptions make the identification of the system possible when only a reduced number of measurements are available.
Before stating our second result, a definition is required.
Definition 1
The Kruskal rank of a matrix \({\varvec{A}}\in \mathbb {R}^{N\times M}\), denoted \(\textrm{kr}({\varvec{A}})\), is the maximum number k such that any combination of k columns of \({\varvec{A}}\) forms a submatrix with full column rank.
Although the Kruskal rank is, in general, more restrictive than the traditional rank and harder to verify, when the entries of \({\varvec{A}}\) are drawn from a continuous distribution, its Kruskal rank equals its rank [33].
To begin with, we consider a model without exogenous inputs. That is, a pure selfdriving system.
Theorem 2
Let \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) abide to the secondorder AGVM
$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}\end{aligned}$$
for sparse matrices \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and \({\varvec{H}}^{(2)}\) satisfying A.1 and A.2, respectively. If\(\textrm{kr}(\mathcal {X}^\top ) \ge 2(K_1 + K_2)\), where \(\mathcal {X} := [({\varvec{X}}^{(1)})^\top \, ({\varvec{X}}^{(2)})^\top ]^\top\), then \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) can be uniquely identified.
Proof
See Appendix. \(\square\)
This result shows that it is possible to uniquely identify both graph Volterra kernel matrices when the Kruskal condition is met even in the case that the model is selfdriven, i.e., no exogenous inputs.
In the following, we present a result involving the exogenous inputs.
Theorem 3
Let \(\{{\varvec{X}}^{(1)},{\varvec{X}}^{(2)}\}\) and \({\varvec{Y}}\) abide to the secondorder AGVM
$$\begin{aligned} {\varvec{X}}^{(1)}= {\varvec{H}}^{(1)}{\varvec{X}}^{(1)}+ {\varvec{H}}^{(2)}{\varvec{X}}^{(2)}+ {\varvec{\Gamma }}{\varvec{Y}}, \end{aligned}$$
for sparse matrices \({\varvec{H}}^{(1)}\) with diagonal entries \([{\varvec{H}}^{(1)}]_{ii} = 0\) and \({\varvec{H}}^{(2)}\) satisfying A.2; and a diagonal matrix \({\varvec{\Gamma }}\) with diagonal entries \([{\varvec{\Gamma }}]_{ii} \ne 0\). Given a matrix \({\varvec{\Pi }}_1\) such that \({\varvec{X}}^{(1)}{\varvec{\Pi }}_1 = {\varvec{0}}\) and \({\varvec{Y}}{\varvec{\Pi }}_1 \ne {\varvec{0}}\), if \(\textrm{kr}({{\varvec{C}}}[{\varvec{X}}^{(1)}*{\varvec{X}}^{(1)}]{\varvec{\Pi }}_1) \ge 2K_2 + 1\), where \({\varvec{C}}\) is a binary selection matrix picking the appropriate rows of the Kronecker product, then the positions of the nonzero entries of \({\varvec{H}}^{(2)}\) are unique.
Proof
See Appendix. \(\square\)
Here, differently from the pure selfdriven case, the presence of the exogenous term leads to a different identifiability condition: structural identifiability. That is, given that the condition on the Kruskal rank of the projected data matrix is met, the positions of the nonzero entries of the secondorder graph Volterra kernels \({\varvec{H}}^{(2)}\) can be uniquely identified. When the graph Volterra kernels are related to the \((p+1)\)cliques [cf. (7)] in a network, the above result allows for higherorder link prediction. In addition, the support of \({\varvec{H}}^{(1)}\) can then also be partially estimated from the nonzero entries of \({\varvec{H}}^{(2)}\), as by assumption, in this case, their supports share a relation. More specifically, the existence of a triangle between nodes directly implies edges among the elements in the clique. However, as the nonexistence of a clique, e.g., a triangle, does not rule out the existence of an edge, not all edges, i.e., positions of the nonzeros of \({\varvec{H}}^{(1)}\), can be identified.
To present an identifiability result using sparse recovery, we employ the following definition.
Definition 2
(Restricted Isometry Property (RIP)) Matrix \({\varvec{A}} \in \mathbb {R}^{N\times M}\) possesses the restricted isometry of order s, denoted as \(\delta _s\in (0,1),\) if for all \({\varvec{h}}\in \mathbb {R}^{M}\) with \(\Vert {\varvec{h}} \Vert _0\le s\) [34]
$$\begin{aligned} (1\delta _s)\Vert {\varvec{h}}\Vert _2^2 \le \Vert {\varvec{A}} {\varvec{h}}\Vert _2^2 \le (1+\delta _s)\Vert {\varvec{h}}\Vert _2^2. \end{aligned}$$
(13)
RIP is a fundamental property for providing identifiability conditions of sparse recovery. It has been shown that given \(\delta _{2s} < \sqrt{2}1\), the constrained version of the Lasso optimization problem
$$\begin{aligned} \begin{array}{ll} \underset{{\varvec{h}} \in \mathbb {R}^{M}}{\min }{\Vert {\varvec{h}}\Vert _1 } \quad \textit{subject to} \quad \Vert {\varvec{y}}  {\varvec{A}} {\varvec{h}}\Vert _2 \le \epsilon \end{array} \end{aligned}$$
(14)
yields \(\Vert {{\varvec{h}}}  {\varvec{h}}^* \Vert _2^2 \le c \epsilon ^2\) for some constant c depending on \(\delta _{2s}\) when the linear model \({\varvec{y}} = {\varvec{A}}{\varvec{h}}^* + {\varvec{v}}\), \(\Vert {\varvec{v}} \Vert _2 \le \epsilon\) holds, where \({\varvec{h}}^*\) is the solution of (14) [34, 35].
In the literature, in particular works on sparse polynomial regression [35] and Volterra series [36], several guarantees have been established for system matrices spawning from different alphabets and/or different distributions. For instance, in [24] results for Volterra system identification have been derived for signals drawn from \(\{1,0,1\}\) and in [35] signals drawn from \(\mathcal {U}[1,1]\). However, the bipolar case, e.g., \(\{1,1\}\), has not been considered and its treatment within the selfdriven Volterra expansion is still missing. Therefore, in the following, we present a RIP result for the secondorder AGVM, whose technical proof is detailed in Appendix.
Theorem 4
Let \(\{x_i(t)\}_{i=1}^{N}\) for \(t\in [1,2,\ldots ,T]\) be an input sequence of independent random variables drawn from the alphabet \(\{1,1\}\) with equal probability. Assume that the AGVM regression matrix is defined as
$$\begin{aligned} \tilde{{\varvec{X}}}^\top = \frac{1}{\sqrt{T}}[{\varvec{X}}^l\,{\varvec{X}}^b]^\top \in \mathbb {R}^{T\times L}, \end{aligned}$$
where \(L = N+N(N1)/2\), \({\varvec{X}}^l := {\varvec{X}}^{(1)}\) and \({\varvec{X}}^b\) is \({\varvec{X}}^{(2)}\) with the quadratic terms removed, i.e., it only contains bilinear terms \(x_i(t)x_j(t),\,i\ne j\). Then, for any \(\delta _s\in (0,1)\) and for any \(\gamma \in (0,1)\), whenever \(T \ge \frac{4C}{(1\gamma )\delta _s^2}s^2\log N\), the matrix \(\tilde{{\varvec{X}}}^\top\) possesses RIP \(\delta _s\) with probability exceeding \(1  \exp \bigg (\frac{\gamma \delta _s^2}{C}\cdot \frac{T}{s^2}\bigg )\), where \(C = 2\).
Notice that the pure quadratic terms in \({\varvec{X}}^{(2)}\) have been removed. This is due to the fact that for the bipolar signal case, these quadratic terms are constant when the alphabet is \(\{1,1\}\) and equivalent to \({\varvec{X}}^{(1)}\) when the alphabet is \(\{0,1\}\). Hence, in both cases its contribution can be omitted without loss of generality. Furthermore, the data matrices are normalized concerning for the number of available measurements, i.e., T. This is done in order to guarantee that the diagonal entries of the Grammian of \(\tilde{{\varvec{X}}}^\top\) are unity in expectation.
This theorem asserts that \(T \in {{\mathcal {O}}}(s^2\log N)\) observations suffice to recover an ssparse vector with graph Volterra kernels. Since it is considered that the number of unknowns per row of \({\varvec{H}}^{(1)}\) and \({\varvec{H}}^{(2)}\) is at most \({{\mathcal {O}}}(N)\) [cf. 12], the bound on the sampling complexity scales as \({{\mathcal {O}}}(N^2\log N)\) which agrees with bounds obtained for linear filtering setups [24, 37]; however, in this paper, the constant C is relatively small.
Given that under the established conditions, the proposed AGVM model is identifiable and is able to leverage sparsity to relieve its sampling complexity, in the following section, we present taskspecific constraints for higherorder link inference and methods for estimating the graph Volterra kernels.