Secure relay selection based on learning with negative externality in wireless networks

In this paper, we formulate relay selection into a Chinese restaurant game. A secure relay selection strategy is proposed for a wireless network, where multiple source nodes send messages to their destination nodes via several relay nodes, which have different processing and transmission capabilities as well as security properties. The relay selection utilizes a learning-based algorithm for the source nodes to reach their best responses in the Chinese restaurant game. In particular, the relay selection takes into account the negative externality of relay sharing among the source nodes, which learn the capabilities and security properties of relay nodes according to the current signals and the signal history. Simulation results show that this strategy improves the user utility and the overall security performance in wireless networks. In addition, the relay strategy is robust against the signal errors and deviations of some user from the desired actions.


Introduction
Relay selection has been recognized as a critical issue for both cooperative communications [1][2][3] and multi-hop wireless networks. Efficient and secure relay selections in wireless networks have to overcome various technical challenges at different levels, such as the channel state estimation regarding the relay nodes and attack detection [4,5]. For example, source nodes have to avoid choosing the relay nodes that play packet dropping attacks by deliberately dropping some messages and never forwarding them to the destination [4]. In the presence of multiple potential relay nodes in the coverage area, a user has to use the relay node that can provide a high secure data rate with a good radio propagation condition and high transmit power. On the other hand, due to the limited transmission and processing capability of a relay node, each customer user achieves less utility if the corresponding relay simultaneously serves more users.
To this end, game theory is a powerful math tool to constitute a formal analytical framework that enables the study of complex interactions among the source nodes and relay nodes with different serving properties in wireless networks. In particular, the Chinese restaurant game (CRG), initially inspired by the Chinese restaurant process, is a promising tool to address the negative externality issue in the relay selection, where each player makes decision sequentially based on the received signals reflecting the state of the tables in a Chinese restaurant and avoids choosing a crowded table [6]. The Chinese restaurant game model is a prominent tool to address emerging problems in wireless communications, especially the cooperative spectrum accessing [7] and the spectrum sharing in cognitive radio networks [8].
In this paper, we consider a wireless network with multiple source nodes or users, which aim at sending their messages to the destination nodes. There are multiple potential relay nodes with different transmission capabilities, due to the radio channel conditions, transmit power, and processing speed, as well as various security properties. For instance, some nodes might drop some relay messages on purpose or leak the relay messages, resulting in the privacy loss of the source nodes. By formulating the secure relay selection process into a sequential Chinese restaurant game, we propose a learning-based relay selection strategy to improve the secure end-to-end data rate in wireless networks. This scheme captures the characteristics of relay nodes at different levels, including their security properties, buffer sizes, transmission capacities, and processing speeds, as well as the number of current serving users. Users estimate the relay state by learning from the history and the current signals that reflect the relay properties. The relay nodes are chosen to maximize their own expected secure data rates accordingly.
Our contributions can be summarized as follows: 1. The CRG-based relay selection strategy takes into account the relay security and avoids choosing a crowded relay and thus can improve the user utility. 2. By exploiting the previous signals received by the neighboring nodes on the relay properties, the CRGbased strategy can provide some degree of robustness against the signal error. Moreover, this strategy is also robust against possible irrational decisions or deviations from the proposed schemes. In other words, even when some users deviate from this strategy, the other users can still benefit from following the scheme in the long term.

Related work
Many interesting works have investigated how a single source node selects relay in cooperative wireless communications according to the radio channel information, such as the channel state information (CSI) [1,3], the parameters in the Nakagami channel model [2], and the finite-state Markov channel model [9]. In [10], a cooperative relay transmission strategy was proposed over multiple potential relays. The relay selection can be formulated using an optimization model based on the constrained Markov decision process [11]. In [12], a cooperative relay diversity protocol was designed to increase the coverage area in wireless networks. In addition, it is shown in [13,14] that node cooperation with known CSI information in wireless networks can improve the user secrecy capacity. In wireless networks with multiple users that simultaneously transmit messages, the work [4] provides a distributed relay selection strategy that applies the Stackelberg game to reduce the overall power consumption. Yu and Ray Liu proposed reputation-based, cheat-proof, and attack-resistant cooperation stimulation strategies to improve the security performance in autonomous mobile ad hoc networks [5]. In [15], an indirect reciprocity principle was applied to improve the performance of a large-scale mobile network, and the stability condition was investigated in [16]. In order to improve the communication efficiency, a min-max coalition-proof channel allocation scheme was proposed in [17] for multi-hop wireless networks.
The remainder of the paper is organized as follows: We describe the network model in Section 2 and formulate it into a Chinese restaurant game in Section 3. We propose the secure relay selection scheme based on the Chinese restaurant game in Section 4. Next, in Section 5, we present the simulation results to evaluate its performance. Finally, a short conclusion is drawn in Section 6.

Network model
We consider a typical wireless network as shown in Figure 1, which consists of C source nodes or users, K relay nodes, and a common destination node. Each source node has to deliver a message to the destination with the help of a relay node. For simplicity, we assume a two-hop wireless network, where the destination node is out of the coverage area of the source nodes but can be reached by the relay nodes.
The message transmission process consists of two stages: (1) the C source nodes send messages to the relay nodes in sequence, and (2) the relay nodes amplify and forward the messages to the destination node. This work can be extended straightforwardly to the cooperative communication scenarios in single-hop networks, where both the source and relay nodes transmit cooperatively during the second stage.
Without loss of generality, we assume that the relay nodes have different buffer sizes, security properties, and transmission capabilities due to various transmit power and radio channel states and thus provide different service qualities to the users. For instance, a relay node performing the packet dropping attack deliberately drops some messages and thus reduces the user's end-to-end data rate. In addition, a relay node with serious propagation fading or low transmit power provides lower transmission rates to the users. Therefore, we classify the service quality of a relay node into Q levels, where 1 is the worst, and Q is assigned to the most powerful relay. Let R(k,w) ∈ {1, 2,…, Q} denote the total secure throughput of relay k, where the second parameter w ∈ {1, 2,…, W} is the relay or network state and is unknown to the users. As some relay nodes have the same transmission capability, the number of relay state, denoted with W, is usually much less than Q K .
In the first stage of the transmission, C users choose sequentially from K relay nodes, based on the relay state learnt from the history and current signal. The latter users cannot change the relay selection decisions of the former users. In general, User i has better understanding on the relay state by investigating more signals than the former users. Once messages from the source nodes are received, relay nodes forward the messages to the destination. The source nodes choosing the same relay node use time rotations to share the transmission and processing capability of the relay. Thus, a crowded relay degrades the end-to-end data rate for each customer user.

Game formulation
The Chinese restaurant game is a dynamic game, where players have knowledge on both the decisions of the former players and the table state in a Chinese restaurant [6,7]. We study the relay selection in a two-hop wireless network with a CRG model, where the players are C source nodes and the tables are K relay nodes. The action set in this model is A = {1, 2,…, K}, and the action represents the relay node, which the player selects to deliver their messages to a destination in sequence. The players that are assumed to be rational choose actions to maximize their own utilities, which correspond to their secure data rates to the destination node. For the scope of this paper, we interchangeably use the terms users, source nodes, and players.
Each player is assumed to receive a signal on both the qualities of the K relay nodes and the signal history of the previous users. Without loss of generality, we take User i as an example, with 1 ≤ i ≤ C. In such a game, User i obtains from a control channel a signal on the relay state, denoted with s i ∈ {1, 2,…, W}, and the signal history, h i = {s 1 , s 2 ,…, s i − 1 }, which contains the revealed signals for the previous i − 1 users. Note that the signals mentioned in this paper inform users about the relay states, instead of being the messages sent to the destination.
The signals are in general imperfect. Let Pr(s i |w) represent the probability that the signal to User i is s i , given the true relay state w. For simplicity, we model the signals with the following Bernoulli distribution: where p indicates the signal accuracy. Note that this work is not limited to the Bernoulli model in Equation 1 and can be easily extended to the other signal models.
The prior distribution of w is given by g 0 = {g(0,1), g(0,2),…, g(0,K)}, where g(0,w) = Pr(w = q) is the prior distribution of the relay state w, which is known by all the users.
In this system, the users choosing the same relay node apply time rotations to share the processing and transmission resources. The goal of each user is to maximize its own secure data rate. Thus, we define the utility function to User i that takes action k, denoted with U i,k , as follows: where N k is the number of users selecting Relay k at the end of the game. User i takes the action in a deterministic manner, and his best response denoted with r i is given by We will present a learning algorithm in the next section to obtain the solution to such an optimization problem. In this way, User i broadcasts his choice and transmits his message to Relay r i . Next, User i + 1 chooses relay in a similar way, and the game ends when all the C users have taken actions. For ease of reference, we summarize the commonly used notations in Table 1.

Relay selection algorithm based on CRG
In this section, we present a secure relay selection algorithm for users to choose relay nodes in sequence in wireless networks. This is essentially a learning algorithm that enables users to reach a desirable outcome of the CRG as described in Section 3. Each user makes a decision in three steps: (1) learns the relay state based on the current signal, the signal history, and the actions of the previous users, if there are any, (2) estimates the expected utility, and (3) chooses the relay node that maximizes its own utility. To constitute a concrete example of the learning process, we consider User i with 1 ≤ i ≤ C and present how the message is delivered to the destination node via a relay. In this process, User i exploits its signal s i and the signal history h i = {s j } 1 ≤ j < i to estimate the relay state g(i) = {g(i,w)} 1 ≤ w ≤ W , i.e., the service qualities of these K relays, where g(i,w) = Pr(w|h i ,s i ,g 0 ) is the probability that User i believes that the relay state is w, and g 0 is the prior distribution of the relay state known by the users.
Rational users can apply the Bayesian rule to update their beliefs on the relay state, and the belief of User i is given by where Pr(s i |w) is given by Equation 1. Note that g(i,w) provides the service profile of all the K relay nodes in relay state w, and the secure throughput of Relay k is R (k,w) in this case.
Users have to avoid the crowded relays because of the negative externality of relay sharing as indicated by Equation 2. Let M i = {M i,k } 1 ≤ k ≤ K denote the current relay grouping state, where M i,k is the number of users before User i choosing Relay k. Since each rational user aims at maximizing his expected utility, the action of User i is given by where the expectation is taken over both the relay state and the number of users choosing Relay k after User i. Given by Equation 2, U i,k is the utility to User i, if the relay state is w, and N k users including User i choose Relay k. By definition, the expected utility that User i can obtain in this case is given by where q is the relay state and the expectation in the second line is taken over the number of users choosing Relay k after User i. In order to calculate Equation 5, we introduce n i,k to denote the number of users choosing Relay k since User i. It is clear that the total number of users on Relay k is N k = M i,k + n i,k , where M i,k and n i,k can be obtained by definition: According to Equations 4 to 6, we can rewrite Equation 4 as a double summation of the function that is also conditioned on the number of users choosing Relay k after User i and the relay state q in U i,k : The solution to Equation 7 depends on the distribution of n i,k , which can be derived by the following recursive method: where the history for User i + 1 is h i + 1 = {h i ,s i } and M i + 1 is the grouping result before User i + 1 given by Equation 6. The second line in Equation 8 considers both the signal at time i + 1 (s i + 1 ) and the corresponding relay selection (r i + 1 ). As the last user knows the decisions of all the other users, User C can easily calculate the distribution of n C,k , based on his own choice, r c , i.e., The total number of iterations to address Equation 8 depends on C. For example, if there are C = 7 users in the area, it takes seven iterations to calculate the conditional probabilities in Equation 8.
In this algorithm, User i chooses a relay node based on Equations 3 to 9, transmits the message to Relay r i , and broadcasts his decision to the other users in the neighborhood. Users learn the relay state according to the current signals and the signal history, and predict the behaviors of the following users based on the current relay selection results. This algorithm can be used for radio users to choose relay nodes in wireless networks such as cognitive radio networks and sensor networks.

Simulation results
We performed simulation to evaluate the performance of the CRG-based relay selection scheme for a wireless network, which consisted of seven users and two relays that forwarded the users' messages to the destination node. The average signal-to-noise-ratio for each relay node's signal at the destination node was 15 dB, and the overall bandwidth of a relay node was 10 MHz.
We considered two situations regarding relay performance and set W = 2. Each situation took place with the same probability. There was a selfish relay node in each situation who dropped 60% of the relay messages to save power. Following straightforward calculation, we can see that for the first situation, w = 1, the total utilities that Relays 1 and 2 provided were 35 and 14, respectively. Otherwise, if w = 2, the total utilities for Relays 1 and 2 were 14 and 35, respectively. The signals to each user were generated independently and uniformly according to the signal accuracy denoted by P a .
For comparison, we also evaluated another two relay selection strategies: The simplest is the random relay strategy, where users choose relay nodes randomly and independently, in disregard of the signals. The second is the myopic strategy, which is also a signal-based strategy. In this strategy, users choose relay sequentially. Each user aims at maximizing his current utility and ignores the impacts of the latter users in the network. More specifically, User i chooses the relay node given by the following: Unlike the CRG-based strategy, the decisions of the latter users are ignored, and users make decisions according to their own signals, the signal history, and the decisions of the former users.
Simulation results in Figure 2 show that compared with the random and the myopic strategy, the CRGbased scheme can provide a higher utility because users estimate the other users' decisions and make decisions accordingly. Clearly, the user utility in this strategy changes with the signal quality P a . However, the performance is mostly stable, if P a is greater than 0.9, which means that the scheme has some degree of robustness against the signal errors. In addition, as shown in Figure 2C, users in the middle of the decision making queue, such as User 3, usually have lower utilities than the other users. The reason is that User 1 has the freedom to choose any relay, and meanwhile, the last user has the best knowledge on the relay performance and the choice of the other users.
In the second simulation, we consider the CRG-based relay strategy in a scenario similar to experiment 1, except that User 3 deviates from the CRG strategy with a probability denoted by P miss . As shown in Figure 3, the utility of User 4 slightly decreases with the probability P miss . On the other hand, the performance loss is small if P miss is less than 0.2, indicating that this strategy can provide robustness against the user deviation to some degree.

Conclusion
In this paper, we have investigated the secure relay selection in wireless networks and formulated it with a sequential Chinese restaurant game model that can take into account the security properties, buffer size, transmission strength, and processing ability of relay nodes. We propose a secure relay selection strategy to improve the user utility by avoiding crowded relay nodes. Simulation results show that the proposed scheme can achieve a higher average utility than the other two relay strategies. In addition, this scheme has some degree of robustness against both the signal inaccuracy and user deviation from the given strategy.