Modeling Misbehavior in Cooperative Diversity: A Dynamic Game Approach

Cooperative diversity protocols are designed with the assumption that terminals always help each other in a socially efficient manner. This assumption may not be valid in commercial wireless networks where terminals may misbehave for selfish or malicious intentions. The presence of misbehaving terminals creates a social-dilemma where terminals exhibit uncertainty about the cooperative behavior of other terminals in the network. Cooperation in social-dilemma is characterized by a suboptimal Nash equilibrium where wireless terminals opt out of cooperation. Hence, without establishing a mechanism to detect and mitigate effects of misbehavior, it is difficult to maintain a socially optimal cooperation. In this paper, we first examine effects of misbehavior assuming static game model and show that cooperation under existing cooperative protocols is characterized by a noncooperative Nash equilibrium. Using evolutionary game dynamics we show that a small number of mutants can successfully invade a population of cooperators, which indicates that misbehavior is an evolutionary stable strategy (ESS). Our main goal is to design a mechanism that would enable wireless terminals to select reliable partners in the presence of uncertainty. To this end, we formulate cooperative diversity as a dynamic game with incomplete information. We show that the proposed dynamic game formulation satisfied the conditions for the existence of perfect Bayesian equilibrium.


Introduction
Cooperative wireless communications is based on the principle of direct reciprocity where wireless terminals attain some of the benefits of multiple input multiple output (MIMO) systems through cooperative relaying, that is, by helping each other. Since direct reciprocity is "help me and I help you" kind of protocol, a terminal will be motivated to help others attain cooperative diversity gain with the anticipation to reap those same benefits when the helped terminals reciprocate. When all terminals obey rules of cooperation, a stable and socially efficient cooperation is realizable, which may be true in wireless networks under the control of a single entity wherein terminals cooperate to achieve a common objective, as in military tactical networks. On the other hand, in commercial wireless networks where terminals are individually motivated to cooperate, the assumption that terminals will always obey rules of cooperation may not hold: (1) terminals may misbehave and violate rules of cooperation to reap the benefits without bearing the cost, (2) well-behaved terminals may refuse to relay for their potential partners without the assurance that the partners will reciprocate. While the first reason is motivated by a selfish intention to save energy, the second reason is motivated by the absence of mechanisms to incentivize cooperation in existing cooperative protocols. Hence, in commercial wireless networks, it is difficult to ensure a stable and socially efficient cooperation without implementing a mechanism to detect and mitigate misbehavior.
Game theoretic approaches have been proposed to design mechanisms that incentivize cooperation in commercial wireless networks. The proposed mechanisms belong to either price-based or reputation-based schemes. In pricebased cooperation [1,2], terminals are charged for channel use when transmitting their own data and get reimbursed when forwarding for other terminals. It is shown that the pricing scheme leads to a Nash equilibrium that is Paretosuperior. In reputation-based schemes [3,4], the authors proposed Generous Tit for Tat (GTFT) algorithm which conditions the behavior of nodes based on their past history. The authors showed that if the game is played long enough, GTFT leads to an equilibrium point that is Pareto-optimal. The game theoretic models in the aforementioned works in particular and in literature in general, consider a static game model where players are assumed to make decisions simultaneously. Since simultaneous decision making implies that players are unable to observe each other's actions, static game models do not capture well dynamics of cooperative interactions. Recently a dynamic Bayesian game framework has been proposed to model routing in energy constrained wireless ad hoc networks [5], which provides the motivation for our work.
Motivated by the inadequacy of static game models to fully characterize cooperative communications, we formulate interactions of terminals in cooperative diversity as a dynamic game with incomplete information. The dynamic game formulation captures temporal and information structure of cooperative interactions. Temporal structure of a dynamic game defines the order of play: cooperative transmissions occur in sequential manner wherein a source terminal transmits first and then potential cooperators decide to either cooperate or deviate from cooperation. The sequential nature of cooperative transmissions is dictated by the half-duplex constraint of wireless devices, that is, a relay terminal cannot receive and transmit at the same time in the same frequency band. The information structure of dynamic games characterizes what each player knows when it makes a decision: in commercial wireless networks, intention of each user is not known a priori, hence, incomplete information specification of the game represents the uncertainty each user has about the intention of other users in the network. In this paper, we present a general dynamic game framework that may fit any of the existing cooperative diversity protocols. We show that the proposed model captures important aspects of existing cooperative diversity protocols. We also show that the proposed dynamic game formulation satisfies the requirements for the existence of perfect Bayesian equilibrium. This paper is organized as follows. In Section 2, the system model is described. In Section 3, game theoretic analysis of cooperative diversity is presented. Background of dynamic games is presented in Section 4. In Section 5, a dynamic game framework is presented. Finally, in Section 6, concluding remarks are given.

System Model
We consider N-user TDMA-based cooperative diversity system wherein terminals forward information for each other using any one of the existing cooperative schemes. We assume that a source terminal randomly selects utmost one potential cooperator (relay) among all its neighboring terminals. It is important to note that random selection of potential cooperators indicates the assumption held by all terminals that their relay terminals are always willing to help. A source terminal and its potential partner establish a possible cooperative partnership prior to data transmission by exchanging control frames. Through the established cooperative partnership, terminals enter into a nonbinding agreement to forward information for each other (see Figure 1). Details of the mechanism by which cooperative partnerships are formed is beyond the scope of this work as our primary focus is on examining the sustainability of this partnership.
The interterminal channels are characterized by Rayleigh fading. We denote by γ s,d , γ s,r , γ r,d instantaneous signalto-noise ratio (SNR) of source-destination, source-relay, and relay-destination channels. Information is transmitted at a rate of R b/s in a frame length of M-bits. We assume that all users transmit at the same power level and modulation/rate.

Game Theoretic Analysis of Cooperative Diversity
3.1. Two-User Cooperation. In this section, we examine the cooperative interaction between terminals within the framework of noncooperative game theory. We assume that the benefits of cooperation and the cost it incurs are common knowledge. That is, terminals are willing to expend their own resources to help other terminals achieve reliable communication with the expectation to achieve those same benefits when the helped terminals reciprocate. We assume that terminals are individually rational in that terminals behave in a manner to maximize their individual benefits from cooperation. We assume rational behavior of terminals is common knowledge, that is, terminals know that other terminals are rational. Individuality rationality is crucial for the evolution of cooperation as it states that well-behaved terminals have strong preference for partners that conform to rules of cooperation. On the other hand, individual rationality may lead to selfish behavior where a terminal is tempted to economize on cost of cooperation (energy) while reaping the benefits. We show that in the presence of selfish users, individual rationality dominates cooperation which would consequently lead to a noncooperative Nash equilibrium that is suboptimal in the Pareto sense. We denote the strategy available to all terminals by Θ where Θ ∈ {θ 0 = cooperate, θ 1 = misbehave}, that is, Θ is the strategy space of the game. Source terminal S i transmits to the network whenever it has information to send. Thus, its strategy space is a singleton and is denoted by Θ i . On the other hand, relay terminal R j may either obey the rules of cooperation or deviate from it. Thus, the strategy space of R j is a nonsingleton set which is defined as Θ j = {θ 0 = cooperate, θ 1 = misbehave}, where θ j ∈ Θ j is pure strategy of R j . We assume that a misbehaving relay node R j adopts mixed strategy where it plays pure strategyθ j with probability p j (Θ j ). It is obvious that mixed strategy incurs uncertainty in the game since source terminal S i has no knowledge whether R j conforms to cooperation or violates it. Terminal R j being a rational player will adopt this strategy to confuse its partner by mimicking the unpredictable nature of the wireless channel. From a game-theoretic viewpoint, mixed strategy ensures that the game has Nash equilibrium.
The utility function of terminal S i is defined in terms of cooperative diversity gain which is denoted by u i (p i (Θ i ), p j (Θ j )), where p j (Θ j ) captures behavior of its partner. In the next section, we formally define the utility function for cooperative diversity in terms of achievable performance gains at the physical layer. For the purpose of simplifying the discussion in this section, achievable cooperative diversity gain when all terminals obey the rules of cooperation is denote by ρ c . On the other hand, when all terminals opt out of cooperation, each terminal derives a degraded cooperative diversity gain compared to the attainable benefit; this utility is denoted by ρ nc where obviously ρ nc < ρ c . We assume that each terminal expends a fraction of its available power for cooperation, which defines the cost of cooperation and is denoted by c c . We assume that the cost of cooperation is strictly less than the attainable cooperation benefit, that is, c c < ρ c . The utility matrix of the game is then where ρ c − c c is the net utility when all terminals cooperate, ρ nc − c c is the utility to a well-behaved terminal when its partner deviates from cooperation. The terminal that deviates from cooperation derives utility ρ c at no cost and ρ nc is the noncooperative utility. Suppose terminals i and j form cooperative partnership where each terminal affirms its willingness to cooperate via a protocol handshake. A willingness to cooperate may indicate that a terminal has enough available power to expend for cooperation. It may also indicate a terminal's intent to economize on the other terminal's cooperative behavior. We assume that both terminals i and j play mixed strategies when each terminal acts as a relay to help the other terminal. Their mixed strategies, respectively, are where p j (θ 0 ) is the probability with which relay terminal R j cooperates with source terminal S i , and p j (θ 1 ) is probability of misbehavior. Similarly p i (θ 0 )p i (θ 1 ) capture probabilities of cooperation and misbehavior when terminal i acts as a relay to terminal j. The expected net utility function of each terminal can be shown as where [ ] T is the transpose operator. When both terminals obey the rules of cooperation (p i (θ 0 ) = 1, p j (θ 0 ) = 1), each derives a net utility of ρ c − c c . We examine next steady-state behavior of the game when either player deviates from cooperation by adopting mixed strategy. Let us consider the case where terminal j is a potential cooperator that plays mixed strategyP j . The goal of an individually rational and mixed strategy playing terminal j is as follows: (1) maximize its net expected utility by minimizing the cost of cooperation and (2) behave in a manner that make it difficult for terminal i to distinguish between effects of channel dynamics and misbehavior. Thus, terminal j strategically selects P j (mimicking inherent uncertainty of the wireless channel) in such away that player i is indifferent in expected net utility. That is, player j chooses a mixed strategy where player i would achieve the same expected utility irrespective of the strategy terminal j plays. If such a mixed strategy exists, it means that in the long-run terminal i may be unable to learn about the behavior of its partner.
However, terminal i is a rational player and will learn in the long run about the behavior of its potential partner by observing its utility. In wireless communications, quality of service metrics such as target frame error rate (FER) help terminals determine degradation in achievable cooperative diversity performance gain. Thus, there is no P j that will make terminal i indifferent in expected utility. Due to the lack of indifferent strategy that could confuse its partner, rational player j will reason that it can forgo the cooperation cost (i.e., p j (θ 0 ) = 0) in order to maximize its expected net utility. It is obvious to see from (3) that if player i is well behaved (p i (θ 0 ) = 1) and player j misbehaves (p j (θ 0 ) = 0), player i would derive net expected utility of u i (p i (Θ i ), p j (Θ j )) = ρ nc − c c . On the other hand, the misbehaving partner j would achieve expected utility u j (p j (Θ j ), p i (Θ i )) = ρ c . Note that (1 − p j (θ 0 ))c c is an amount of energy terminal j saves by a misbehaving.
Similarly, for the case of mixed strategy play by terminal i, the same arguments can be applied to show that there is no P i that will make player j indifferent in expected net utility, which indicates that a selfishly rational player i will also be tempted to forgo the cooperation cost (i.e., p i (θ 0 ) = 0) to derive a net expected utility u i (p i (Θ i ), p j (Θ j )) = ρ c . Thus, an individually rational terminal i will play p i (θ 0 ) = 0 to achieve the highest utility irrespective of the strategy adopted 4 EURASIP Journal on Advances in Signal Processing Pareto optimal cooperative strategy (achievable when trust develops between players) Sub-optimal Nash equilibrium Figure 2: Best response functions in the mixed strategy noncooperative game. It can be seen that the strategy combination (p i (θ 0 ) = 1, p j (θ 0 ) = 1) is attained when trust develops between the players which leads to the evolution of cooperation. by its partner. For this reason, the steady-state behavior of both players is characterized by the strategic combination (p i (θ 0 ) = 0, p j (θ 0 ) = 0) which is a degenerate mixed strategy Nash equilibrium. Hence, the optimal strategy of both terminals is to deviate from cooperation: (1) for selfish reasons where a relay terminal exploits cooperative behavior of other terminals to economize on cost of cooperation; (2) to avoid being economized on. Thus, at steady state each terminal opts out of cooperation, where in terms of the best response function of each player ( Figure 2); if p i (θ 0 ) = 0, then player j's unique best response is p j (θ 0 ) = 0 and vice versa. We have shown that the degenerate mixed strategy Nash equilibrium of the game is (p i (θ 0 ) = 0, p j (θ 0 ) = 0) which is suboptimal in the Pareto sense. Generally, the suboptimal solution tells us that while well-behaved terminals are willing to cooperate for the social benefit, misbehaving terminals maintain their individual rationality to reap the cooperation benefits at no cost, which leads to a social-dilemma. In other words, while cooperation is a socially efficient strategy, individually rational terminals reason that they can do better by deviating from cooperation. Cooperation in socialdilemma is characterized by a lack of trust among the players since each terminal is uncertain about the intention of other terminals in the cooperative network. In other words, the attainable Pareto efficient cooperation requires terminals to trust their partners and also to be trustworthy [6]. That is, by putting trust on their partners, terminals make themselves vulnerable by cooperating; by being trustworthy terminals become socially rational and avoid exploiting the vulnerability of the other terminals.
Next we examine evolution of selfish behavior in multiuser cooperative networks. Particularly, we are interested in how the presence of a group of terminals that jointly deviate from cooperation affects cooperative communications. Since the strategies dictated by Nash equilibrium are not stable if a group of terminals jointly deviate to attain better utility, we use evolutionary game theory approaches to examine multilateral deviation by a group of misbehaving terminals.

Evolution of Selfish Behavior.
We consider a cooperative diversity system comprised of a population of terminals that interact randomly to attain cooperative diversity gain. We assume that at any given time a terminal can interact only with utmost one partner in the population. Due to mobility, we assume that every terminal i interacts at least once with every other terminal j, i / = j. Suppose that initially the population conforms to cooperation. Now assume that a small group of selfish terminals (mutants) enter the cooperative diversity system. The question we would like to answer is if the mutants can successfully invade the cooperative diversity system.
Let n C denote the initial number of cooperators and n M denote the number of mutants, note n M n C . The rationale behind the presence of very few mutants is to show vulnerability of cooperative diversity to misbehavior (see Figure 3). We denote by p C and p M the fraction of cooperating and misbehaving terminals, respectively. In other words, n C terminals cooperate with probability p C while the rest of the terminals deviate from cooperation with probability p M . We assume that the population of cooperators and mutants play pure strategy. Although cooperators and mutants adopt pure strategy, the entire population plays mixed strategy. The mixed strategy probability vector of the population is The utility matrix of the game is defined in (1). We examine the interaction within the population within evolutionary game theory framework to characterize dynamics of the spread of misbehavior in multiuser cooperative diversity. Evolutionary game theory deals with constantly interacting players that adapt their behavior by observing their utilities. The evolution of strategies into higher utility yielding strategies is characterized by using replicator dynamics [7]. Replicator dynamics predicts the rate at which strategies that yield higher utilities spread through the network. Thus, for multiterminal cooperative diversity system with utility matrix U and mixed strategyP that varies continuously with time, the evolution of cooperation and misbehavior is given by the replication equation: whereẋ denotes the derivative, (UP) 1 and (UP) 2 are expected utilities of cooperators and mutants, respectively: The first term on the right-hand side in (6a) is utility derived from cooperator-cooperator cooperation while the second term on the right-hand side in (6a) is utility derived from cooperator-mutant cooperation; the third term is the cost incurred by cooperators. Similarly, the first term on the right hand side in (6b) is utility derived from mutant-cooperator cooperation while the second term is from mutant-mutant cooperation which actually results in noncooperation. The average utility of the population is It is evident that cooperators derive utility that is strictly less than the average utility of the population, that is, (UP) 1 < P T UP. On the other hand, mutants reap utility that is well above the average, that is, (UP) 2 > P T UP. Dynamics of the game dictates that nodes observe their utilities and adapt to strategies that provide higher utilities. In other words, lowutility cooperators will start imitating strategy of mutants (their misbehaving partners) and forgo the cooperation cost in an attempt to achieve a higher utility. That is, low-utility cooperators will learn that they can do better at the expense of other nodes. Due to the absence of techniques to determisbehavior, the number of misbehaving nodes (mutants) increases monotonically while the number of cooperators grows at a negative rate. This indicates that the mutants successfully invade a relatively larger population of wellbehaved cooperators. A decrease in number of cooperators indicates a reduction in the number of nodes that selfish nodes will cheat on. The population will reach a steady state where there is no cooperator left to exploit. The network evolves to a noncooperative state where each node opts out of cooperation as shown in Figure 4. Thus, noncooperation is an evolutionary stable strategy (ESS) which means that the presence of a few misbehaving nodes can drive away cooperators from the Pareto optimal cooperative strategy. ESS is robust against coalition of cooperators that attempt to shift the equilibrium point toward cooperation. That is, a small number of cooperators cannot invade a population of misbehaving nodes. Thus, cooperation is an evolutionary unstable strategy. Hence, we have shown that the presence of misbehaving nodes impedes evolution of socially efficient and stable cooperation. Hence, without establishing a mechanism to detect and mitigate effects of misbehavior, cooperative diversity will not evolve into a stable system in which users interact in a socially efficient manner to attain a Pareto efficient equilibrium. The game theoretic analysis presented in this section assumes a static game model where the order in which terminals  make decisions has not been taken into account. Indeed, the order of play has no significance in the outcome of the analysis since the goal has been to give insight into effects of selfish behavior in existing cooperative schemes. While the static game model proves useful in the analysis, due to its simplicity it may not capture the underlying dynamics of cooperative schemes. Even though evolutionary game theory enables us to analyze dynamics of interaction of a population of nodes, it does not provide a framework to capture the complex structure of cooperative interactions. In the next section, we characterize cooperative communications within the dynamic Bayesian game framework which would enable us to develop mechanisms that ensure evolution of stable cooperation. The Bayesian dynamic game model fully captures relevant details of cooperative interactions between source and relay nodes. First we present background material on dynamic games.

Dynamic Games: Background
Dynamic games model a decision-making problem where the order of play and information available to each player are very significant to understanding the decision of each player [8,9]. While order of play characterizes sequential interactions, information available to each player describes what each player knows when making decisions. For instance, cooperative interactions occur sequentially, that is, source terminals always transmit first and then relay terminals decide to either forward or drop the transmission. A dynamic game is represented in extensive-form [10].
In extensive form, a game is represented in a tree structure which describes the sequential interactions and evolution of the game. The root of the tree where the game begins is the initial decision node and is denoted by I. A noninitial nodeD that has branches leading to and away from it is a decision node which may indicate end of a stage game and represent the sequence relation of the decision of the players [11]. A decision node with no outgoing branches is referred to as a terminal node and it is where the game ends.
A dynamic game is a multistage game, where a stage game is represented by one level on the tree. In the temporal domain, stages of the game are defined by time periods where the kth stage is denoted by t k [12]. A dynamic game with finite number of stages is referred to as a finite-horizon game where t k ∈ {0, 1, . . . , K}; otherwise, it is an infinite horizon game, that is, t k ∈ {0, 1, . . .}.

Information Sets.
The edges of the tree represent actions available at decision nodes that would lead to other decision nodes. The sequence of actions defines the path that connects decision nodes to each other (within a stage) or decision nodes to terminal nodes. The path for each stage game t k identifies history h(t k ) of play during time period t k . Players may have uncertainty about history of the game which is referred to as a game of imperfect information. That is, when it is its turn to move, a player has no knowledge about the node the game has reached. This uncertainty is captured in a set of decision nodes the game can possibly reach. We refer to this set of decision nodes as information set and is denoted as h. Information sets identify information possessed by players [9]. For instance, in a game of perfect information where players have exact knowledge about history of the game, the information set is a singleton set, that is, for all h ∈ H, |h| = 1, where H is information set of the game. On the other hand, in a game of incomplete information where some players have private information, the information set is a nonsingleton set for at least one of the players, that is, ∃h ∈ H, such that |h| > 1. An elliptic curve is drawn around a player to show its uncertainty about which node in the information set is reached, as shown in Figure 5.
In a game of incomplete information, the action taken by a player is a function of which decision node in its information set has been reached. We denote by A(h) the set of actions available to a player with information set h. The action taken by the player at stage game t k is denoted by a(t k ) and it is a mapping from h to A(h), that is, a(t k ) : h → A(h). In extensive form games, players may adopt random strategies at each information set. This is called behavior strategy wherein players assign probability measure over actions available at each information seth. Behavior strategy is denoted by σ(a(t k ) | h) where σ(a(t k ) | h) ∈ Δ(A(h)), Δ(A(h)) probability distribution over A(h). For instance, in a cooperative network wherein every one obeys the rules of cooperation σ(a(t k ) | h) = 1, which is pure strategy. Nature is usually introduced as a nonstrategic player that randomly informs players which decision nodeD in h has been reached. Figure 5 shows cooperative communications as a dynamic game. The initial node is a source terminal that transmits to the network. The two decision nodes represent potential cooperators where behavior of D 1 is known perfectly as shown by its singleton information set, whereas D 2 maintains private information that is not common knowledge in the network. Nature N randomly assigns decision nodes for player D 2 .

Cooperative Diversity as a Dynamic Game with Incomplete Information
We have shown that cooperation in wireless networks is characterized by social-dilemmas which ultimately impede the evolution of a socially efficient cooperation. It is evident that social-dilemmas are prevalent in commercial wireless networks where terminals violate rules of cooperation for selfish reasons. In the presence of heterogeneously behaving terminals, cooperators exhibit uncertainty about the intention of their potential partners which makes selection of a reliable partner challenging. Our goal is to develop a mechanism that would enable terminals strategically select reliable partners in the presence of uncertainty. To this end, we develop a framework in which cooperative communications is formulated as a dynamic game with incomplete information. Note that a dynamic game with incomplete information is a dynamic Bayesian game. We consider a wireless communications system with a population of N terminals wherein terminals that are within transmission ranges of each other form a cooperative diversity system. We assume that benefits of cooperation and the cost it incurs are common knowledge. That is, terminals are willing to expend their own resources to help others achieve reliable communication with the expectation to achieve those same benefits when their partners reciprocate. Terminals are rational in that they behave in a manner to maximize their individual benefit of cooperation. We assume that terminals maintain private information pertaining to their behavior (i.e., to either cooperate or misbehave). Note that the problem formulation is general in that it is not tailored toward one particular cooperative diversity protocol. However, we may present examples based on a specific protocol for purposes of simplifying discussions.
We formulate cooperative communications as a finitehorizon discrete-time dynamic game. The game is discretetime since each player is assumed to have a finite number of strategies [8]. Within each stage t k , k = 0, 1, . . . , K, a source terminal and its potential cooperator (relay) interact repeatedly for a duration of T seconds. The assumption of multiple cooperative interactions within a stage game is intuitively valid since cooperative transmissions span EURASIP Journal on Advances in Signal Processing Figure 6: Example 1. Extensive form representation of a cooperative network with perfect information; R j , R l , and R k denote cooperative relay nodes and S i denotes source node i. Note the absence of Nature in this network.
multiple time slots. The period T for each stage game t k may be defined as the time it takes a cooperatively transmitted signal to reach its intended destination. We assume that duration of a stage game T is long enough to average out effects of channel variation. It is obvious that a new stage game starts when a source terminal i (i ∈ {1, 2, . . . , N}) that has data to send begins transmitting to the network. We characterize next the behavior of every potential cooperator j and source terminal i within the dynamic Bayesian game framework. Note that we use the terms relay and potential cooperator interchangeably. We next model selfish behavior of relay terminals within a dynamic Bayesian game framework. We then present a framework in which source terminals make optimal decisions.

Modeling Selfish Behavior.
We assume each relay terminal j maintains private information which corresponds to the notion of type in Bayesian games. The set of types available to relay terminal j constitutes relay terminal's type space defined as Θ j = {θ 0 = Cooperate, θ 1 = Misbehave}. Since every terminal j either conforms to cooperation or deviates from it, Θ j is also the global type space of the game. Following the notation of Bayesian games, type of player j is denoted by θ j while other players' type is denoted by θ − j , where θ j , θ − j ∈ Θ j . We assume that types associated with each relay terminal are independent. Type space of every relay terminal j maps to an action spaceA j which defines a set of actions a j (t k ) available to player j of typeθ j . The set of actions A j defines information seth j of relay terminal j; in other words, h j maps to action spaceA j (h j ), that is, a(t k | h j ) : h j → A j (h j ). Note that the change in notation is to show that the action taken by the relay is a function of the information set. We assume that type of terminal j and the associated action a(t k | h j ) do not change within a stage game. Indeed, a relay that obeys rules of cooperation do not change its type at each stage game. On the other hand, a misbehaving relay may strategically change its type at the beginning of each stage game. In this paper we assume that a misbehaving relay adopts behavior strategy wherein it randomly changes its behavior from cooperation to misbehavior at each stage game. Behavior strategyσ j assigns a conditional probability over A j , that is, σ j = p(a j (t k | h j ). For completeness, we define history of the game at the beginning of stage game t k as h j t k = (a(t 0 ), a(t 1 ), . . . , a(t k−1 )). It is intuitive to assume that a relay which violates rules of cooperation may not need to observe history of the game when it chooses its actions. The utility function of relay terminal of typeθ j is denoted as u j (θ j , θ − j ) where θ − j is type of other terminals. Later in this section we give a formal definition of the utility function.
We present examples to elucidate the game theoretic framework we just introduced. Let us consider Amplify-and-Forward (AF) [13] cooperation protocol where a potential cooperator j amplifies faded and noisy version of signal received from source terminal i and forwards it to a destination. Suppose that an amplification factor that depends on the potential cooperator's type and dynamics of the channel is defined as where β is amplification subject to power constraint at the relay and dynamics of the interuser channel denoted as h i, j [13]. On the other hand, a j (t k | h j ) captures action taken by relay j when one of the decision nodes in its information set is reached. We describe below various typesθ j of relay terminal j which will give a significant insight into the dynamic game framework.
(1) First, we consider a cooperative network where every relay node j obeys the rules of cooperation. This is a network where nodes cooperate for a common objective, that is, type of each relay node j is θ j = 0. Consequently, the information set of each relay j is a singleton set, that is, |h j | = 1 and the corresponding action space is A j (h j ) = {1}. Since relay node j has deterministic behavior, it would play a j (t k | h j ) = 1 with probability σ j (t k ) = 1, that is, it plays pure strategy (it always forwards). History at the end of stage game is t k , h j (t k ) = (a(t 0 ) = 1, a(t 1 ) = 1, . . . , a(t k ) = 1). The amplification B(h j , h Si,Rj ) is then a function of channel dynamics and power constraint at the relay, that is, The extensive form representation of this game is straightforward. We would like to point out that the dynamic game framework can used to design a resource management for a cooperative network such as this one (see Figure 6).
(2) In the second example, we consider a cooperative network where relay nodes violate rule of cooperation in probabilistic manner. That is, relay node j plays behavior strategy where it exhibits mixed behaviors of cooperation and selfishness. This is a network where nodes have uncertainty about the behavior of other nodes. In other words, relay node j has private information, that is, type of relay node j is θ j = 1. The relay has two strategies that it selects randomly, that is, it decides to either forward or refuse cooperation which means that it has two decision nodes 8 EURASIP Journal on Advances in Signal Processing Note that the incomplete information of the game has been transformed to imperfect information since we introduce Nature as N which will randomly assigns a decision node to the relay. S i denotes source node i.
in its information seth j , that is |h j | = 2. Since the relay adopts behavior strategy, the action space is captured in is probability measure over set of actions A j (h j ). Randomly behaving relay either cooperates (i.e., a j (t k | h j ) = 1) with probability σ j or it deviates from cooperation (i.e., a j (t k | h j ) = 0) while with probability 1 − σ j (t k ). Consequently, the amplification is a function of relay behavior and dynamics of the channel, that is, B(h j , h Si,Rj ) ∈ {0, β}. Note that in the special case where a relay always refuses to forward, that is, Θ j = (θ 1 ), |h j | = 1, and a j (t k | h j ) ∈ A j = {0} deterministically, thus B(h j , h Si,Rj ) = 0 (see Figure 7).
(3) The third example is a continuation of the second example. Here we consider an intelligent and selfish relay j of typeθ j = 1. The relay is intelligent in the sense that it always forwards for its partner but at a randomly selected reduced power level. Obviously the relay has selfish intentions, that is, minimizing its cost-to-benefit ratio. We assume that selfish relay R j random selects a normalized power level l from a finite set of power levels L, where 0 < l < 1. Thus, information set of the relay is defined by the set of normalized power levels L, that is, |h j | = |L|. The action space of the selfish relay j is the set of power levels, that is, A j (h j ) = (0, . . . , 1). The behavior strategy is σ j (t k ) = p(a j (t k | h j )) where a j (t k | h j ) = l, l ∈ L. The amplification B(h j , h Si,Rj ) is obviously determined by behavior of the relay and channel dynamics, where B(h j , h Si,Rj ) = (0, . . . , β). Note that a terminal which exhibits such ambiguous behavior may exploit dynamics of the channel to evade detection (see Figure 8).
The extensive form representation of Example 1 is straight forward since all information sets are singleton sets. On the other hand, for Examples 2 and 3 NatureN will Figure 8: Example 3. Extensive form representation of a cooperative game with imperfect information. R j.1 , . . . , R j.|L| denote decision nodes of the relay, that is, the different power levels that Nature N will randomly selects for R j . S i denotes source node i.
assign decision nodes to relay j. The probability with which decision nodes are assigned is determined by the behavior strategy of the relay. The role of Nature can be justified within the context of behavior strategy. Since relay node j plays behavior strategy, it requires a device that will randomly select a strategy from the possible set of strategies. Nature will play the role of this randomizing device and assign strategies at each stage of the game. We assume the amount of power relay expends for randomization is negligible compared to cost it would have incurred by cooperating. Although it is customary to put Nature at the beginning of a game, Kreps and Wilson [9] noted that moves of Nature may also be put anywhere on the game tree.

Behavior of Source Terminals
. While introducing the model for selfish behavior in the previous subsection, we said that each relay maintains private information pertaining to its behavior. The private information and the sequential nature of cooperative interactions gives relay terminals a dominant position in deciding to either cooperate or misbehave. In other words, source terminals are vulnerable to defection by their partners. In this subsection, we present a framework for designing a technique where source terminals make optimal decisions in the presence of uncertainty. It is evident that a stage game begins when a source terminal starts transmitting to the network. In the language of game theory, this means a source terminal makes the decision to transmit whenever it has information to transmit. In the extensive-form representation, a source terminal has only a single decision node which characterizes the decision to transmit. Thus, any source terminal i has an information set that is a singleton. In other words, its decision node maps to an action space that is also a singleton, that is, A i = {1}, which implies that if a source terminal has data to send, it will transmit to the network with probability 1. Note a(t k | h i ) = 1 captures the decision to transmit. It follows from the singleton information set that the type space of source terminal i is also a singleton set. In the subsequent paragraphs we describe a framework for selecting reliable partners. We introduce the concept of belief which characterizes each source terminal's level of uncertainty about the behavior of its potential partners.
is a subjective probability measure over the possible types of relay terminal j given θ i and history h i (t k ) at the beginning of stage game t k , that is, We would like to point out that by maintaining belief, source terminals deviate from the assumption (as in existing cooperative protocols) that their partners are always willing to cooperate. Indeed, belief is a security parameter that characterizes the level of trust each terminal maintains on its potential partners. We assume that beliefs are independent across the network which is intuitively valid since beliefs are subjective measures of terminal behavior. We assume that every source terminal i maintains a strictly positive belief, that is, μ j i (t k ) > 0. This is intuitively valid in commercial wireless networks that are characterized by dynamic user population where it is difficult to have definite prior knowledge about the behavior of every user. We assume that the belief structure of the dynamic game is common knowledge which means that relay terminals (which are also potentially source terminals) are aware that cooperation is belief based. We argue that individual rationality together with knowledge of game structure motivates relays to adopt behavior strategy.
The obvious questions are (1) since μ j i (t k ) is conditioned on how relay j behaves in the previous stage t k−1 (h i (t k )), how would source i learn about the history since it does not perfectly observe what Nature assigned to the relay (game of imperfect information)?, (2) how is belief at the first stage of the game μ j i (t 0 ) initialized? Before addressing the questions, we would like to point out that each source terminal i determines behavior of its partners using any of the misbehavior detection techniques proposed in [14][15][16][17]. Although actions of relay terminal j are not perfectly observable, the effects of relay's actions are captured by the detection techniques which will provide a probabilistic measure of the history. This probability measure will be used to update belief of source terminal i at the end of stage game t k . Before we discuss how prior beliefs are assigned, we introduce belief system that describes the belief updating procedure.

Belief
System. The belief system defines belief updating procedure for each source terminal i using Bayes' rule at the end of each stage game t k . The posterior belief at the end of stage game t k is where p(h i (t k ), θ i | θ j ) is probability measure on the history of the game at the end of stage game t k , which is obtained from a detection technique; p(θ j ) is prior belief at the beginning of stage game t k . At the end of each stage game, source terminals obtain new information about behavior of their partners. The belief at the end of stage game t k will be used as prior belief for the next stage game t k+1 . The belief at the end of the last stage of the game t K reveals reputation of relay terminal j which is a measure of the relay's trustworthiness.
It is important to note that detection techniques are designed to tolerate certain levels of false alarm and miss detection. While false alarm events result in degradation of belief probability, miss detection events wrongly elevate belief probability of misbehaving terminals. Thus, it is obvious that accuracy of the belief system is determined by the robustness of the detection technique implemented. (1) Nondistributed. If source terminal i has no prior interaction with relay terminal j, it will assign equal prior probabilities for all possible types of relay terminal j, that is, (2) Direct Reciprocity. This is also a nondistributed approach in which source terminals initialize their beliefs based on what they know about the relay. Thus, if source terminal i and relay terminal j have prior history of cooperation, source terminal i will condition future cooperation based on past history. That is, the prior belief for the new cooperative interaction will be set to the reputation of the relay in the previous cooperation, that is, μ j i (t 0 ) = p(θ j | θ i , h(t K )), where h(t K ) history at the last stage game of the previous cooperative interaction.
(3) Distributed (Indirect Reciprocity). Indirect reciprocity is a mechanism where terminals obtain information on their potential partners from other terminals in the network. It is a distributed mechanism which is enabled by exchanging of reputation information. At the end of each cooperative interaction, source terminals reveal reputation information of their partners to the rest of the network. By exchanging reputation information, each terminal gains a global view of the network. Note that indirect reciprocity is a robust mechanism which ensures stable and socially efficient cooperation [18] if adopted by all nodes.
It is important to note that detection techniques are designed to tolerate a certain level of false alarm and miss detection, which means that accuracy of the belief system is determined by the performance of the detection technique implemented.

Partner Selection.
Partner selection is the mechanism by which source terminals select reliable relays based on their past history. We assume that each source terminal i stores belief information on each potential relay in a trust vector, where μ i is normalized trust vector. It is clear that relay terminals with relatively higher normalized belief will be more likely selected as partners. It is important to note that a selected potential relay may refuse cooperation based on its belief about source terminal i. Source terminal i may share its trust vector with other terminals in the network. For instance, terminal i may inform terminal l about behavior of terminal k. Terminal l then forms a weighted belief about k based on its belief about i, that is,

Utility Function.
The utility function of the game is a measure of the net cooperation gain of each individual node. It is defined in terms the attainable benefit of cooperation and the cost incurred. The attainable benefit of cooperation is measured by the average frame success rate (FSR) where BER is average bit error. For instance, for cooperative AF BER is given by ρ is modulation parameter, Q( x e −z 2 /2 dz. The cost of cooperation E R which is incurred by a relay terminal R is sum of (1) energy expended to establish cooperative partnership; (2) energy expended to forward information bearing signals to help a partner. The total energy a relay terminal expends for cooperation, where E R,data energy expended to forward data and E R,handshake energy expended to establish cooperative partnership. The source terminal also expends E S,handshake for protocol handshake. Total energy expended for cooperative transmission of information bearing signal is given by E I = E R,data + E S,data , where E S,data is energy expended by source terminal assuming the presence of direct transmission from source to destination. Note E R,handshake (E S,data , E R,data ). In [19] utility function of a wireless network is defined as a measure of the number of information bits received per joule of total energy expended, where  E S,handshake contributes zero utility since no information bits are transmitted during the protocol handshake. Thus, (17) defines a well-behaved utility function where E I → 0, u i → 0, and E I → ∞, u i → 0. We verify behavior of the utility function as shown in Figure 9. Note that the utility function is inverse of the cost-to-benefit ratio (see Figures 10 and 11).

Formal Definition of the Game.
Cooperative communications is a 6-tuple dynamic Bayesian game G : (N, Θ, h, A, µ, u), where N is the number of nodes in the cooperative network. Θ is the type space of relay nodes, h is the information set of nodes, A is action space profile of the nodes. µ is system of beliefs of source nodes, and u is a vector of utility functions.

Perfect Bayesian Equilibrium (PBE)
. PBE is a beliefbased solution concept for dynamic games of incomplete information [9]. Unlike static games where equilibrium points are comprised of strategies, PBE incorporates belief in the equilibrium definition [20]. In [20], the author noted the importance of beliefs in the equilibrium definition. Thus, PBE defines a solution concept where players make optimal decisions at each stage of the game given their beliefs. We show that the proposed dynamic Bayesian game model for cooperative communications satisfies the requirements for the existence of PBE [9], (1) Requirement 1: at each information set the player with the move has some beliefs about which node in its information set has been reached.
(2) Requirement 2: given its belief a player must be sequentially rational, that is, whenever it is its turn to move, the player must choose an optimal strategy from that point on. Relay of type θ j = 0 Relay of type θ j = 1 with A j = {0, 1} Relay of type θ j = 1, A j = 0 Figure 11: Utility relay terminal j as a function of total energy expended for cooperative transmission of information bearing signal. It is evident that a selfish terminal can exploit the cooperative behavior of its partners to maximize its utility.
We intentionally left out a fourth requirement which deals with unreationalizable strategies which have no practical meaning in our setting since the action space of the game is concisely defined.
Proof. Requirement 1 is trivially satisfied since the information sets of source nodes are singleton sets which indicate that whenever a source node has information to send, it transmits to the network. Thus, we can assign probability one to each decision node in the singleton set at each stage game t k . Requirement 2 is met by the problem this thesis set out to solve, that is, we would like to design a mechanism where source nodes make optimal decisions given their belief. Requirement 3 is satisfied by the belief system in (10). Thus, the proposed dynamic game model satisfies the conditions for the existence of PBE and that it admits PBE. It also admits sequential equilibrium since for every extensive form game, there exists at least one sequential equilibrium [9, Proposition 1]. We argue based on evolutionary game theoretic arguments that if (1) a significant fraction of the nodes adopts sequential rationality (obey Requirement 2) and (2) they share reputation information with other nodes, an evolutionary stable cooperation is attainable.

Conclusion
In this paper we develop a dynamic Bayesian game theoretic framework for cooperative diversity. We showed that the proposed game theoretic framework captures vital aspects of cooperative communications. We showed that the dynamic game framework admits perfect Bayesian equilibrium. The framework presented in this paper would provide a foundation to develop a reputation-based cooperative diversity system where source terminals exchange belief information to confine cooperation to terminals whose behavior is known a priori.