Quantization-aware sampling set selection for bandlimited graph signals

We consider a scenario in which nodes of a graph are sampled for bandlimited graph signals which are uniformly quantized with optimal rate and original signals are reconstructed from the quantized signal values residing on the nodes in the sampling set. We seek to construct the best sampling set in a greedy manner that minimizes the average reconstruction error. We manipulate the reconstruction error by using the QR factorization and derive an analytic result stating that the next minimizing node can be iteratively selected by finding the one that minimizes the geometric mean of the row vectors of the inverse upper triangular matrix R-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {R}^{-1}$$\end{document} in the QR factorization. We also compare the complexity of the proposed algorithm with different sampling methods and evaluate the performance of the proposed algorithm by experiments for various graphs, showing the superiority to the existing sampling methods when quantization is involved.

such that the worst case of the reconstruction error is iteratively minimized [5][6][7] and the uncertainty principle for graph signals was derived and sampling strategies based on the uncertainty principle were suggested in [8]. Algorithms without eigendecomposition were presented to show fast and comparable performance [9,10]. Greedy samplers were proposed to focus on reconstruction of the second-order statistics of graph signals from sampled graph signals since the second-order statistics (i.e., graph power spectrum) provides critical information in inference applications such as inpaining and prediction [11]. To evaluate the reconstruction performance of sampling methods, universal bounds were presented and greedy approaches minimizing the reconstruction error were shown to achieve a near-optimality by the concept of approximate supermodularity [12]. Recently, a QR factorization-based method that greedily selects one node at each iteration so as to directly minimize the reconstruction error was developed with a competitive complexity and reconstruction performance [13].
The sampling set can be also constructed by the greedy solutions to the sensor selection problem in which a subset of nodes in sensor networks is selected in a greedy manner such that the reconstruction error for parameter estimation or a well-defined proxy related to the reconstruction error is optimized [14,15]: The log-determinant of the inverse estimation error covariance matrix is maximized to find the sampling set with a guaranteed near-optimality [14] and the frame potential as a proxy for the reconstruction error was introduced as a metric and a greedy removal algorithm was proposed to achieve near-optimality [15]. An efficient greedy technique was presented to find the least number of sensor nodes and their sensing locations by employing a new criterion with maximum projection onto the minimum eigenvalue of the dual observation matrix [16]. Quantization of graph signals at nodes before transmission is inevitable since nodes or vertices in physical networks operate with energy constraint and limited communication bandwidth. Hence, the effect of quantization of graphs signals on the selection process should be addressed while the previous work [17] derived an analytic solution to the problem of allocating optimal rate to each node in the sampling set where uniform quantization at nodes are assumed.
In this work, we consider a graph sampling problem formulated in previous work [5][6][7][8][9][10][11][12][13][14][15] for network applications such as social, energy, neural and sensor networks. We aim to devise an efficient sampling set selection algorithm for bandlimited graph signals which are uniformly quantized by using optimal rate. Specifically, we incorporate quantization into the selection process and seek to greedily select one node at each iteration that minimizes the average reconstruction error computed from quantized signal values with optimal rate allotted into the nodes in the sampling set. Note that the previous work [17] focused on optimal rate allocation to the sampling set after the set is constructed by selection methods. To the best of our knowledge, this is the first to take into account quantization to determine the best sampling set that minimizes the average reconstruction error. We first apply the QR factorization to formulate the reconstruction error and present a simple criterion to select the next minimizing node at iterations by using the analytic results proved in [13] and the optimal rate solution derived in [17]. Specifically, we show that the reconstruction error is iteratively minimized by selecting the next node minimizing the geometric mean of the row vectors of the inverse upper triangular matrix R −1 in the QR factorization. We also discuss the complexity of the proposed algorithm in comparison with different novel sampling methods. Finally, we investigate through experiments the performance of the proposed algorithm, demonstrating a performance gain over previous selection methods for quantized signals on various graphs. This paper is organized as follows. The sampling set selection problem with quantization is formulated in Sect. 2. The reconstruction error obtained from quantized graph signals is manipulated by using the QR factorization to produce a simplified metric, and an analytic result is derived to propose a simple selection criterion in Sect. 3. The complexity of the proposed algorithm is analyzed in comparison with different sampling methods in Sect. 4.1. The reconstruction performance is evaluated by extensive experiments in Sect. 4.2 and conclusions in Sect. 5.

Problem formulation
For an undirected and weighted graph G = (V, E) with N nodes indexed by the set V = {1, . . . , N } and edges E = {(i, j, w ij )} , where w ij represents the edge weight from node i to node j, we consider a graph signal f = [f 1 , . . . , f N ] ⊤ which is a function defined on V with signal values f i on the ith vertex. We employ variation operators (e.g., combinatorial graph Laplacian, normalized Laplacian) to describe the signal variation in a graph caused by the connectivity of the graph [2,6]. Let L , N × N matrix be a variation operator with eigenvalues | 1 | ≤ . . . ≤ | N | and corresponding orthonormal eigenvectors u 1 , . . . , u N . Noting that an arbitrary graph signal f can be expressed as f = Uc where U = [u 1 · · · u N ] is the eigenvector matrix and c = [c 1 , . . . , c N ] ⊤ the graph Fourier transform (GFT) of f , ω-bandlimited graph signal f (i.e., c i = 0, | i | > ω, ∀i > r ) can be given by where U VR is an N × r matrix with rows of U indexed by V and columns of U indexed by R = {1, . . . , r} , and c R an r × 1 column vector with its entries of c indexed by R.
We consider a situation in which a given number of nodes are selected to obtain a sampling set S and the sampled signal f S with the entries f i of f indexed by S and the signal value f i at node i is uniformly quantized with a rate of R i bits, given the total rate budget R T = |S| i R i . Then, we seek to find the best sampling set that minimizes the average reconstruction error where f Q is the reconstructed signal from the quantized signal f Q S which is obtained by uniformly quantizing the sampled signal f S . It is assumed that f S is corrupted by an additive measurement noise n ∈ R |S| consisting of independent and identically distributed (iid) entries with zero mean and variance σ 2 : where the quantization error vector e = [e 1 , . . . , e |S| ] ⊤ is modeled as an additive noise with its entry e i iid with zero mean and the variance 2 i 12 where i is the quantization step size given by i = I 2 R i with the quantization interval I and this additive noise model for quantization error is commonly employed for the analysis of quantization error since it behaves in a similar manner to that of additive white noise [18]. In this work, we produce the uniqueness sampling set S defined as the set for ω-bandlimited signals if noise-free ω -bandlimited signals f can be perfectly reconstructed from the sampled signal f S , implying that every ω-bandlimited signal has its unique samples in the uniqueness set S and it is shown that the uniqueness set S can be constructed by choosing r independent row vectors of U VR (i.e., choosing the ith row vector corresponds to selecting the ith node) [6].
Letting U SR be the |S| × r matrix with |S| independent rows of U VR indexed by S , the quantized noisy signal f Q S can be given from (1) and (2): In this work, we use the notation of the pseudoinverse for non-invertible matrices. Then, the reconstructed signal f Q is expressed by is the reconstructed signal from the noisy sampled signal f S . Now, we can obtain the average reconstruction error by using the analytic results in [17] as follows: 12 . It should be noticed that in computing the average reconstruction error in (4), it is assumed that f ≈f , implying the noise-free sampled graph signal f S or the case of high signal-to-noise ratio (SNR). This assumption ensures no prior distributions of graph signals required in the metric to be minimized, leading to a simplified process. However, since graph signals are typically noise-corrupted in practical applications, the proposed algorithm is evaluated in comparison with different methods in noisy situations in Sect. 4.2.
We continue to formulate the constrained optimization problem to find the best sampling set S * that minimizes the average reconstruction error: In this work, since we construct S * by greedily selecting one node at a time, we aim to minimize the intermediate reconstruction error at the ith iteration given by . . , i and S i the set of i nodes selected. Specifically, at the (i + 1) th iteration, one node is selected out of the nodes in the complement of set S i which consists of the remaining vertices in V denoted by S C i such that S i ∪ S C i = V and S i ∩ S C i = ∅ : in other words, one independent row vector (u (i+1) ) ⊤ is selected from the remaining rows in U S C i R where u (i) indicates the transpose of the row vector selected at the ith iteration. It should be noted that selection of the minimizing row at the ( i + 1)th iteration is conducted after the optimal rate is allotted to each node in S i+1 with the intermediate total rate R T i+1 assumed to be uniformly incremented by using and it is assumed that u (i) 's are independent of each other. The selection process in (5) and (6) is iteratively conducted r times with S i replaced by S i+1 at the next iteration. Note that the rate constraint includes both inequality and equality and all of the rate budget available at each iteration is fully used to minimize the reconstruction error.

Method: sampling set selection algorithm
We apply the QR factorization to where Q i+1 is the r × (i + 1) matrix with (i + 1) orthonormal columns q 1 , . . . , q i+1 and R i+1 the (i + 1) × (i + 1) upper triangular matrix with columns r 1 , . . . , r i+1 . Note that in this work, the QR factorization is performed by the Householder transformation due to its lower complexity and more stability than the Gram-Schmidt orthogonalization [19]. Then, we have where (7) follows since ((Q i+1 ) ⊤ ) + has orthonormal columns. Now, we prove a theorem that presents a simple criterion by which each row from U VR minimizing the intermediate reconstruction error is selected at iterations.
Theorem Let r i+1 be the (i + 1) th column vector of R i+1 and (a i+1 j ) ⊤ the jth row vector of (R i+1 ) −1 . Then, the selection process that iteratively minimizes the intermediate reconstruction error formulated in (5) can be conducted by searching the minimizing row u (i+1) * , i = 0, . . . , r − 1 at each iteration: where (u (i+1) ) ⊤ corresponds to one of the rows of U S C i R and d the (i + 1) th entry of r i+1 . Proof For each u (i+1) selected, we perform the optimal rate allocation which can be found by solving the constrained optimization problem: where (9) follows from matrix multiplication and � * = (� * 1 , . . . , � * i+1 ) ⊤ is a column vector of optimal step sizes that minimize the reconstruction error and derived in [17] as follows: Now, we use the optimal solution * in (10) to compute the reconstruction error with the optimal rate: To be concise, the selection of the next node that minimizes the intermediate reconstruction error with optimal rate allocation can be simply determined by finding one of u (i+1) 's that minimizes the geometric mean of the row vectors of (R i+1 ) −1 regardless of the total rate budget R T i+1 . Furthermore, with the analytic results derived in [13], (R i+1 ) −1 is just constructed from (R i ) −1 at the previous iteration and r ⊤ i+1 ≡ [b ⊤ d] by appending the ( i + 1)th column vector to the last column position: where r i denotes the ith column vector of (R i+1 ) −1 . To guarantee the existence of the inverse of R i+1 (equivalently, the independence of u (i) 's), it should be assumed that d = 0 (see [13] for the detail).
Note that at the (i + 1) th iteration where (R i ) −1 with its row vectors (a i j ) ⊤ is given from the previous iteration, the geometric mean in (8) can be computed with � a i+1 j � 2 , j = 1, . . . , i + 1 updated from a i j 2 and r i+1 by a simple addition: where r i+1 (j) is the jth entry of the (i + 1) th column vector of (R i+1 ) −1 . Initially, we have

Thus, the metric (8) is reduced to
Starting with u (1) * , Q 1 , R 1 and R −1 = 1 �u (1) * � , the criterion (8) are repeatedly evaluated from i = 1 to |S| − 1 until the best sampling set S with the cardinality of r is constructed. The proposed sampling set selection algorithm is summarized as follows:

Complexity of proposed algorithm
With the eigenvector matrix of the variation operator computed as a preparation step, the proposed algorithm seeks to select the next minimizing node at the (i + 1) th iteration by using (R i ) −1 with its row vectors (a i j ) ⊤ . The selection process is conducted by two main tasks: for each u (i+1) , the generation of the (i + 1) th column vector r i+1 of R i+1 and the computation of r i+1 of (R i+1 ) −1 by (11) and the geometric mean of � a i+1 j � 2 by (12a) and (12b). This task is repeatedly executed for each of (N − i) remaining rows, producing the operation count C GM ≈ (N − i)(2ri + 2i 2 ) flops. After finding the next row u (i+1) * that takes the minimum of the (N − i) geometric means, the second task is performed to generate the Householder matrix for u (i+1) * which needs C H ≈ 2ri 2 − 4r 2 i + 2r 3 flops. Thus, at the ( i + 1)th iteration, the operation count spent for the two main tasks is C i+1 ≈ 2Nri − 4r 2 i . Furthermore, this selection process is executed ( |S| − 1 ) times, resulting in the total operation count of the proposed algorithm given by O(Nr|S| 2 ).
Since our algorithm is developed based on the analytic results in [13], it would be noteworthy to discuss how quantization changes the sampling set selection process by comparing with the selection algorithm without quantization proposed in [13], denoted by the QR factorization-based method (QRM). Specifically, the reconstruction error without quantization at the (i + 1) th iteration is given by tr (R i+1 (R i+1 ) ⊤ ) −1 , the cost function for QRM and can be further manipulated as follows: where (14) follows since the first term i j=1 �r j � 2 is irrelevant in finding the next minimizing row u (i+1) * . Hence, the sampling set selection without quantization involves the arithmetic mean of � a i+1 j � 2 which in turn requires a computation of only �r i+1 � 2 . In (14) tr contrast, the quantization-aware sampling process makes us take into account the geometric mean which needs each update of row vectors of (R i+1 ) −1 . It should be noted that with the extra complexity O(N |S| 2 ) related to computation of the geometric means at iterations, the proposed algorithm offers the complexity of the same order as that of QRM.
For the performance evaluation, we compare with different sampling methods such as efficient sampling method (ESM) [6], greedy sensor selection (GSS) [14] and QRM [13]. ESM constructs sampling sets by simply performing a column-wise Gaussian elimination on U VR , leading to a low weight selection process. GSS iteratively selects the next node that maximizes the log-determinant of the inverse estimation error covariance  matrix. Assuming zero-mean graph signals with the covariance matrix = U VR U ⊤ VR corrupted by an iid additive measurement noise with unit variance, the inverse estimation error covariance matrix is given by −1 [12,14]. In evaluating the metric for GSS, the high SNR is assumed (i.e., −1 = 1 of the proposed algorithm is formulated under the assumption of high SNR. It is noteworthy to examine the relation of the above-mentioned sampling methods with the sampling strategies based on uncertainty principle, denoted by MaxVol and MinPinv in [8] where MaxVol seeks to maximize the determinant of U ⊤ SR U SR and MinPinv aims at minimizing the reconstruction error without quantization given by tr[(U ⊤ SR U SR ) −1 ] . Clearly, GSS and QRM offer simplified greedy methods for MaxVol and MinPinv, respectively. In Table 1, the optimality criteria and the complexity are given for  comparison. As expected, ESM yields a fast selection process at the cost of performance loss since it aims to minimize the indirect metric, i.e., the worse case of the reconstruction error. It can be also seen that the proposed algorithm provides a competitive complexity in comparison with GSS and QRM, especially for |S| ≤ r . The reconstruction performance of the various methods is evaluated in the experiments in Sect. 4.2

Experimental results
We investigate the performance of various sampling set selection methods for four different graphs given below:  For each graph, we generate 50 graph realizations and adopt the combinatorial Laplacian matrix L as a variation operator to compute U VR and c R in (1). We construct uniqueness sampling sets S with size |S| = r by greedily selecting one node at each iteration based on four different techniques: ESM, GSS, QRM and the proposed method. In applying GSS, we let σ 2 f = 10 3 . For performance evaluation, we examine  the case of noisy bandlimited and non-bandlimited signals. We consider a random graph signal f assumed to follow the Gaussian joint distribution: where the covariance matrix K = (L + δI) −1 with δ set to = 0.01 in the experiments to guarantee a proper covariance matrix. We generate the noisy signals f S residing on the nodes in the sampling sets S by using an iid additive Gaussian noise drawn from N (0, σ 2 ) and make uniform quantization of those signal values with the approximated optimal rate R i obtained by adjusting the optimal rate R * i = log 2 I * i to its neighboring integer value such that R i = R T = |S| . We compute the average reconstruction error in dB given by 10 log 10 (E �f −f Q � 2 N ) where 1000 signal samples at each node are averaged for each of 50 graph realizations. We also provide visual demonstration of RSG and the 50 vertices in red selected from the graph by using the four methods in Fig. 1. In the experiments, we generate graphs and the attributes (i.e., L, w ij , U, i ) with their default values by using the graph signal processing toolbox (GSPBox) for MATLAB [20].