Let \(\mathbf {X} \in \mathbb {R}^{N\times T}\) be a real discrete periodic multivariate stochastic process with a finite number of timesteps T that is indexed by the vertex vi of graph \(\mathcal {G}\) and time t. We refer to such processes as time-vertex processes, or joint processes for short.
Our objective is to provide a definition of stationarity that captures statistical invariance of the first two moments of a joint process \(\mathbf {x} = \text {vec}({\mathbf {X}}) \sim \mathcal {D}(\bar {\mathbf {x}}, \mathbf {\Sigma })\), i.e., the mean \(\bar {\mathbf {x}} = \mathbf {E}\left [{\mathbf {x}}\right ]\) and the covariance \(\mathbf {\Sigma } = \mathbf {E}\left [{\mathbf {xx}^{\intercal }}\right ] - \bar {\mathbf {x}}{\bar {\mathbf {x}}}^{\intercal }\). Crucially, the definition should do so in a manner that is faithful to the graph and temporal structure.
3.1 Definition
Typically, wide-sense stationarity is thought of as an invariance of the two first moments of a process w.r.t. translation. For the first moment, things are straightforward: stationarity implies a constant mean E[x]=c1, independently of the domain of interest. The second moment, however, is more complicated as it depends on the exact form translation takes in the particular domain. Unfortunately, for graphs, translation is a non-trivial operation and three alternative translation operators exist: the generalized translation [37], the graph shift [13], and the isometric graph translation [27]. Due to this challenge, there are currently three alternative (though akin) definitions of stationarity appropriate for graphs [17–19].
The ambiguity associated with translation on graphs urges us to seek an alternative starting point for our definition. Fortunately, there exists an interpretation which holds promise: up to its constant mean, a wide-sense stationary process corresponds to a white process filtered linearly on the underlying space. This “filtering interpretation” of stationarity is well known classicallyFootnote 5 as well as in the graph setting [19] and is equivalent to asserting that the second moment can be expressed as Σ=h(LT), where h(LT) is a linear filter. Thankfully, not only filtering is elegantly and uniquely defined for graphs [37], but also stating that a process is graph wide-sense stationary if E[x]=c1N and Σ=h(LG), is a graph filter, is generally consistentFootnote 6 with current definitions [17–19].
This motivates us to also express the definition of stationarity for joint processes in terms of joint filtering:
Definition 1
(JWSS) A joint process x=vec (X) is called jointly wide-sense stationary (JWSS), if and only if
-
(a)
The first moment of the process is constant E[x]=c1NT.
-
(b)
The covariance matrix of the process is a joint filter Σ=h(LG,LT), where h(·,·) is a non-negative real function referred to as joint power spectral density (JPSD).
Let us examine Definition 1 in detail.
First moment condition. As in the classical case, the first moment of a JWSS process has to be constant over the time and the vertex sets, i.e., \(\bar {\mathbf {X}}[\!{i,t}] = c \) for every i=1,2,…,N and t=1,2,…,T. For alternative choices of the graph Laplacian with a null-space not spanned by the constant vector, the first moment condition should be modified to requiring that the expected value of a JWSS process is in the null space of the matrix LT⊕LG (see Remark 2 [19] for a similar observation on stochastic graph signals).
Second moment condition. According to the definition, the covariance matrix of a JWSS process takes the form of a joint filter h(LG,LT), and is therefore diagonalizable by the JFT matrix UJ. It may also be interesting to notice that the matrix h(LG,LT) can be expressed as follows
$$ \mathbf{\Sigma} = h(\mathbf{L}_{G},\mathbf{L}_{T}) = \left(\begin{array}{cccc} \mathbf{H}_{1,1}& \mathbf{H}_{1,2}& \cdots & \mathbf{H}_{1,T}\\ \mathbf{H}_{2,1} & \mathbf{H}_{2,2} & & \mathbf{H}_{2,T} \\ \vdots & & \ddots & \vdots \\ \mathbf{H}_{T,1} & \mathbf{H}_{1,2} & \cdots & \mathbf{H}_{T,T} \end{array}\right), $$
(4)
where each block \(\mathbf {H}_{t_{1},t_{2}}\) of Σ is an N×N matrix defined as:
$$ \mathbf{H}_{t_{1},t_{2}} = \frac{1}{T} \sum_{\tau = 1}^{T} h_{\omega_{\tau}} (\mathbf{L}_{G}) \, e^{j\omega_{\tau}{(t_{1-t2+1})}} $$
(5)
and \(h_{\omega _{\tau }} (\mathbf {L}_{G})\) is the graph filter with frequency response \(h_{\omega _{\tau }} = h(\lambda,\omega _{\tau })\). Being a covariance matrix, h(LG,LT) must necessarily be positive-semidefinite; thus, h(·,·) is real (the eigenvalues of every Hermitian matrix are real) and non-negative. Also, equivalently, every zero mean JWSS process x=vec (X) can be generated by joint filtering x=h(LG,LT)1/2ε a white process ε with zero mean and identity covariance. The following proposition exploits these facts to provide an interpretation of JWSS processes in the joint frequency domain.
Proposition 1
(Generalizes Theorem 1 [17] and Proposition 1 [18, 19]) A joint process X over a connected graph \(\mathcal {G}\) is jointly wide-sense stationary (JWSS) if and only if:
-
(a)
The joint spectral modes are in expectation zero \(\mathbf {E}\left [\!{\hat {\mathbf {X}}}[\!n,{\tau }]\right ]=0 \quad \text {if}~ \lambda _{n} \neq 0 \text { and } \ \omega _{\tau } \neq 0. \)
-
(b)
The product graph spectral modes are uncorrelated \(\mathbf {E}\left [\!{ \hat {\mathbf {X}}[\!n_{1},\tau _{1}] \hat {\mathbf {X}}[\!n_{2},\tau _{2}]}\right ] = 0, \) whenever n1≠n2 or τ1≠τ2.
-
(c)
There exists a non-negative function h(·,·), referred to as joint power spectral density (JPSD), such that
$$\mathbf{E}\left[{\left|\hat{\mathbf{X}}\left[n,\tau\right]\right|^{2}}\right] - \left|\mathbf{E}\left[{\hat{\mathbf{X}}[\!n,\tau]}\right]\right|^{2} = h(\lambda_{n}, \omega_{\tau}),$$
for every n=1,2,…,N and τ=1,2,…,T.
(For clarity, this and other proofs of the paper have been moved to the “Appendix”.)
We briefly present a few additional properties of JWSS processes that will be useful in the rest of the paper.
Property 1
(Generalizes Example 1 [17–19]) White centered i.i.d. noise \(\mathbf {w} \in \mathbb {R}^{NT} \sim \mathcal {D}(\mathbf {0}_{NT},\mathbf {I}_{NT})\) is JWSS with constant JPSD for any graph.
The proof follows easily by noting that the covariance of w is diagonalized by the joint Fourier basis of any graph \(\mathbf {\Sigma }_{\mathbf {w}} = \mathbf {I} = \mathbf {U}_{J} \mathbf {I} \mathbf {U}_{J}^{*}\). This last equation tells us that the JPSD is constant, which implies that similar to the classical case, the energy of white noise is evenly spread across all joint frequencies.
A second interesting property of JWSS processes is that stationarity is preserved through a filtering operation.
Property 2
(Generalizes Theorem 2 [17], Property 1 [19]) When a joint filter f(LG,LT) is applied to a JWSS process X with JPSD h, the result Y remains JWSS with mean cf(0,0)1NT, where c is the mean of X, and JPSD f2(λ,ω) h(λ,ω).
Finally, we notice that for real processes X, which are the focus of this paper, the function h forming the joint filter should be symmetric w.r.t. ω, meaning that h(λ,ω)=h(λ,−ω). This property can be easily derived from the definition of the Fourier transform.
3.2 Relations to classical definitions
We next provide an in-depth examination of the relations between joint wide-sense stationarity, time and vertex stationarity, as well as their multivariate equivalents. For clarity, we order the rows/columns of the covariance matrix Σ such that each \(\mathbf {\Sigma }_{t_{1}, t_{2}}\) block of size N×N measures the covariance between \(\mathbf {x}_{t_{1}}\) and \(\mathbf {x}_{t_{2}}\) (see (4)).
3.2.1 Standard definitions
As we discuss below, known definitions of stationarity in time/vertex domains are particular cases of joint stationarity.
TWSS ∩VWSS ⊂JWSS. The known versions of stationarity (TWSS, VWSS) are oblivious to any structure along one of the two dimensions of X. In this manner, assuming that X is TWSS amounts to interpreting each of the N time series as a separate realization of the same process with TPSD hT(ω). Similarly, if X is VWSS, then each graph signal xt is taken as a separate realization of a single stochastic graph signal with VPSD hG(λ) [17, 19]. It is a simple consequence that, different from the JWSS hypothesis, assuming that X is both TWSS and VWSS is equivalent to limiting our scope to separable JPSD defined as the product of two univariate functions h(λ,ω)=hG(λ)hT(ω)—see also Fig. 1.
3.2.2 Definitions based on the product graph
As explained in Section 2, the JFT can be interpreted as a graph Fourier transform taken over a product graph whose Laplacian is LJ=LG⊕LT. This construction can give rise to two additional definitions for joint stationarity:
3.2.2.1
VWSS on a product graph.
The first is obtained by applying the VWSS definition of [17, 19] on the graph associated with LJ. The resulting model is not sufficiently general in order to generate the full spectrum of JWSS processes. The reason is that, whereas the JPSD h(λ,ω) can be any two-dimensional non-negative function, the JPSD of any VWSS process on LJ is necessarily one-dimensional (the eigenvalues of LJ are the sums of all combinations of the eigenvalues of LG and LT)—see Fig. 1 for a pictorial demonstration and “Appendix: Univariate vs multivariate JPSD” for examples from real data. The same reasoning also holds for alternative products between graphs, such as the strong and Kronecker products [14].
3.2.2.2
Covariance diagonalized by the product graph Fourier transform.
The second definition, which we refer to as JWSS-alternate, entails asserting that the covariance matrix Σ can be diagonalized by the JFT, i.e., the eigenbasis of LJ. This can be seen to differ from the JWSS definition only in case of graph Laplacian eigenvalue multiplicities: whenever the graph Laplacian features repeated eigenvalues, for Definition 1, the degrees of freedom of the JPSD h decrease, as necessarily h(λ1,ω)=h(λ2,ω) when λ1=λ2. This restriction is motivated by the following observation: for an eigenspace with multiplicity greater than one, there exists an infinite number of possible eigenvectors corresponding to the different rotations in the space, and the JPSD is in general ill-defined. The condition h(λ1,ω)=h(λ2,ω) when λ1=λ2 deals with this ambiguity, as it ensures that the JPSD is the same independently of the choice of eigenvectors. On the contrary, with JWSS-alternate, one should construct an arbitrary basis of each eigenspace with multiplicity and setFootnote 7 h(λ1,ω)≠h(λ2,ω). This approach, which was followed in [38], features more degrees of freedom at the expense of the loss of filtering interpretation and higher computational complexity: one may not anymore use filters to estimate the JPSD (without reverting to Definition 1), whereas using the JFT to diagonalize the covariance scales like O(N3+N2T+NT log(T)). On the contrary, in our setting, the PSD estimation complexity can be reduced to be close to linear in the number of edges E and timesteps T (see “Appendix: Implementation details of the JPSD estimator”).
Nevertheless, we should mention that the differences mentioned above are mostly academic. Eigenvalue multiplicities occur mainly when graph automorphisms exist. In the absence of such symmetries (e.g., in the graphs used in our experiments), the two definitions yield the same outcome.
3.2.3 Multivariate definitions
On the other hand, joint stationarity can itself be derived as the combination of two multivariate versions of time/vertex stationarity, which we refer to respectively as MTWSS (see [25]) and MVWSS. Before formally defining them in Definitions 2 and 3, let us state our result formally:
Theorem 1
A joint process X is JWSS if and only if it is MTWSS and MVWSS.
To put this in context, we examine the two multivariate definitions independently.
(a) JWSS ⊂MTWSS. The covariance matrix of a JWSS process has a block circulant structure, as \(\mathbf {\Sigma }_{t_{1},t_{2}} = \mathbf {\Sigma }_{\delta,1} = \mathbf {\Gamma }_{\delta }\), where δ=t1−t2+1. Hence, Σ can be written as
$$\mathbf{\Sigma}_{\mathbf{x}} = \left(\begin{array}{cccc} \mathbf{\Gamma}_{1}& \mathbf{\Gamma}_{2} & \cdots & \mathbf{\Gamma}_{T}\\ \mathbf{\Gamma}_{T} & \mathbf{\Gamma}_{1} & & \mathbf{\Gamma}_{T-1} \\ \vdots & & \ddots & \vdots \\ \mathbf{\Gamma}_{2} & \mathbf{\Gamma}_{3} & \cdots & \mathbf{\Gamma}_{1} \end{array}\right), $$
implying that correlations only depend on δ and not on any time localization. This property is shared by multivariate time wide-sense stationary processes:
Definition 2
(MTWSS [25]) A joint process \(\mathbf {X}= \left [ \mathbf {x}_{1}, \mathbf {x}_{2}, \ldots, \mathbf {x}_{T} \right ] \in \mathbb {R}^{N\times T}\) is multivariate time wide-sense stationary (MTWSS), if and only if the following two properties hold:
-
(a)
The expected value is constant as E[xt]=c1 for all t.
-
(b)
For all t1,t2, the second moment satisfies \( \mathbf {\Sigma }_{t_{1},t_{2}} = \mathbf {\Sigma }_{\delta,1} = \mathbf {\Gamma }_{\delta }, \) where δ=t1−t2+1.
Similarly to the univariate case, the time power spectral density (TPSD) is defined to encode the statistics of the process in the spectral domain
$$ \hat{\mathbf{\Gamma}}_{\tau} = \sum_{\delta=1}^{T} \mathbf{\Gamma}_{\delta} e^{-j\omega_{\tau} \delta}. $$
(6)
We then obtain the TPSD of a JWSS process by constructing a graph filter from h while fixing ω. Setting \(h_{\omega _{\tau }}(\lambda) = h(\lambda,\omega _{\tau })\), the TPSD of a JWSS process is \(\hat {\mathbf {\Gamma }}_{\tau } = h_{\omega _{\tau }}(\mathbf {L}_{G}).\)
(b) JWSS ⊂ MVWSS. For a JWSS process, each block of Σ has to be a linear graph filter, i.e., \(\mathbf {\Sigma }_{t_{1},t_{2}}= \gamma _{t_{1},t_{2}}(\mathbf {L}_{G})\), meaning that
$$\mathbf{\Sigma} = \left(\begin{array}{cccc} \gamma_{1,1}(\mathbf{L}_{G})& \gamma_{1,2}(\mathbf{L}_{G}) & \cdots & \gamma_{1,T}(\mathbf{L}_{G})\\ \gamma_{2,1}(\mathbf{L}_{G}) & \gamma_{2,2}(\mathbf{L}_{G}) & & \\ \vdots & & \ddots & \vdots \\ \gamma_{T,1}(\mathbf{L}_{G}) & & \cdots & \gamma_{T,T}(\mathbf{L}_{G}) \end{array}\right). $$
This is perhaps better understood when compared to the multivariate version of vertex stationarity defined below:
Definition 3
(MVWSS) A joint process \(\mathbf {X} = \left [\mathbf {x}_{1}, \mathbf {x}_{2}, \right. \left.\ldots, \mathbf {x}_{T}\right ] \in \mathbb {R}^{N\times T}\) is called multivariate vertex wide-sense stationary (MVWSS), if and only if the following two properties hold independently:
-
(a)
The expected value is of each signal xt is constant E[ xt]=ct1 for all t.
-
(b)
For all t1 and t2, there exist a kernel \(\gamma _{t_{1},t_{2}}\) such that \( \mathbf {\Sigma }_{t_{1},t_{2}} = \gamma _{t_{1},t_{2}}(\mathbf {L}_{G}) \).
It can be seen that every JWSS process must also be MVWSS, or equivalently JWSS ⊂ MVWSS.