Optimal Policy of Cross-Layer Design for Channel Access and Transmission Rate Adaptation in Cognitive Radio Networks
© Hao He et al. 2010
Received: 29 April 2009
Accepted: 2 December 2009
Published: 12 January 2010
In this paper, we investigate the cross-layer design of joint channel access and transmission rate adaptation in CR networks with multiple channels for both centralized and decentralized cases. Our target is to maximize the throughput of CR network under transmission power constraint by taking spectrum sensing errors into account. In centralized case, this problem is formulated as a special constrained Markov decision process (CMDP), which can be solved by standard linear programming (LP) method. As the complexity of finding the optimal policy by LP increases exponentially with the size of action space and state space, we further apply action set reduction and state aggregation to reduce the complexity without loss of optimality. Meanwhile, for the convenience of implementation, we also consider the pure policy design and analyze the corresponding characteristics. In decentralized case, where only local information is available and there is no coordination among the CR users, we prove the existence of the constrained Nash equilibrium and obtain the optimal decentralized policy. Finally, in the case that the traffic load parameters of the licensed users are unknown for the CR users, we propose two methods to estimate the parameters for two different cases. Numerical results validate the theoretic analysis.
In recent years, the explosive growth of wireless devices and traffic incurs a dramatic increase of the requirement for radio spectrum resource. Unfortunately, as most of the spectrum resource suitable for wireless communications has been assigned, the available spectrum resources become scare. As today's spectrum is managed under a fixed assignment policy, which is highly inefficient in terms of spectrum utilization [1, 2], cognitive radios are adopted to sense their environments and promptly reconfigure their communication parameters based on their observations [3–5].
In CR networks, a new spectrum access method, namely dynamic spectrum access (DSA), is employed to improve spectrum utilization by allowing CR users to access the idle licensed spectrum bands without colliding with the active licensed users . In multi-user environment, the DSA design should also consider the collision with other CR users. Meanwhile, CR users should take the power consumption into account. Furthermore, the time-varying fading nature of radio channel complicates adaptive access and transmission techniques. The solution to above problems asks for cross-layer design between physical layer and upper layers.
Recently, the issue of cross-layer design for dynamic spectrum access has attracted many researchers' efforts. Zhao et al. present a decentralized cognitive medium access control protocol under the framework of partially observable Markov decision process for ad hoc network . This work is then extended by  to maximize the expected number of information bits delivered by an unlicensed user before its total energy is exhausted. Kim and chin  propose a MAC-layer sensing framework and energy-efficient dynamic sensing mode selection algorithm. The design of spectrum sensing and access strategies in the presence of spectrum sensing errors has been addressed in [10–13]. On the other hand, spectrum opportunity sharing among CR users is discussed in [14–17]. In [18, 19], the authors consider the power control in CDMA system under power constraint by formulating a stochastic game model to solve the problem. However, their assumptions of the transmission reward and the channel state in these works are not practical for fading channels. Besides, they do not consider multichannel case. In fact, to the best of our knowledge, there is little work focusing on the cross-layer design under the power constraint by taking the time-varying characteristics of the channel state and the collisions (both the collisions with primary user and the collisions with other CR users) into account. In this paper, we consider cross-layer design of multichannel access and transmission rate adaptation in CR network for both centralized and decentralized cases by taking the time-varying characteristics of channel state into account.
We consider the coexistence of a CR network with a licensed wideband wireless communications network. In centralized case, the cross-layer design problem can be modeled by constrained Markov decision process (CMDP) and solved by a dynamic programming method to achieve the optimal performance. In decentralized case, each CR user only knows its local information and should take actions to maximize the total performance of the whole CR network. We prove the existence of the constrained Nash equilibrium and calculate the optimal decentralized policy.
Another key difference between our approach and that of the above references is that complexity reduction is explicitly taken into account in our method. In both centralized and decentralized cases, the complexity of finding optimal policy increases exponentially with the size of action space and state space, which incurs the so-called curse of dimensionality. To overcome this problem, we perform action set reduction and state aggregation to reduce the complexity without loss of optimality. Under certain condition, we further prove that the multichannel access and transmission rate adaptation policy design can be solved separately in each channel without loss of optimality.
Furthermore, we observe that pure policy is preferred in practical environment due to the convenience of implementation and evaluation. Pure policies take action with deterministic rule. We name all the stationary polices as mixed polices and analyze the difference between pure polices and mixed polices in our proposed cross-layer design. This issue has attracted little attention in exiting researches.
In CR network, the change of white space or spectrum hole utilized for unlicensed communication depends on the spectrum occupancy of licensed users. But the CR users do not know the traffic load parameters of licensed users generally. In this case, we proposed two methods to estimate the traffic load parameters of licensed user.
The remaining part of the paper is organized as follows. In Section 2, the system model is described. The cross-layer design problem is formulated and discussed for both centralized and decentralized cases in Section 3. The complexity reduction of optimal policy design is considered in Section 4, in which we discuss the action set reduction and state aggregation and prove that the multichannel access and transmission rate adaptation policy design can be solved separately in each channel without loss of optimality under certain condition. Section 5 investigates the optimal pure policy design. In Section 6, the estimation methods for the unknown traffic load parameters of licensed user are provided. The numerical result is presented and discussed in Section 7. Finally, we conclude our work and point out the future work in Section 8.
2. System Model
In this paper, we consider the coexistence of a CR network with a licensed wideband wireless communications network. We refer to the wideband channel as licensed channel in this paper. The CR network consists of CR users and a base station. The wideband channel of the wideband network is divided into narrowband channels (or subchannels, subcarriers) that are utilized by the CR users for opportunistic uplink packet transmission. Each narrowband channel can be used by only one CR user in each frame.
For specific spectrum sensing method, such as energy detection, and can be calculated if the average signal-to-noise ratio (SNR) of the licensed user is known . In this paper, we assume the spectrum sensing mechanisms are fixed for both centralized and decentralized cases.
In centralized case, the CR users cooperatively sense the licensed channel. The licensed channel is assumed busy only if the sensing result of every CR user is busy and the probability of false alarm is . By suitable broadcasting mechanism, it is reasonable to assume that the sensing result is identical within the whole CR network. In this case, base station knows the CSI and the power constraint for each CR user and acts as a single controller to design the optimization policy for the whole network.
In decentralized case, the CR users sense the licensed channel separately and only local CSI is available to each CR user and there is no coordination among the CR users. Therefore, the CR users should design cross-layer policy separately.
2.1. Licensed Channel Availability Model
In DSA system, time-varying channel availability should be considered according to the traffic load variation of licensed users, which is assumed to be independent and identically distributed (i.i.d.) alternative renewal process with ON (busy, 0) and OFF (idle, 1) periods . The duration of an ON (OFF) period of channel is described by an exponentially distributed stochastic variable with parameter ( ). If CR user sends the pilot and data symbols to the base station without incurring collision with licensed users during a frame, that is, the channel keeps idle during the frame, an opportunity is exploited as a stochastic variable and depends on recent sensing result. Before further discussion, we give the following definition at first.
Duration probability denotes the probability that at a specified time instant , the availability state for a specific licensed channel is and this licensed channel will keep for an interval if the availability state starts with at .
2.2. Evolution of Sensing Results
According to Definitions 1 and 2, we can consider point probability as duration probability with interval of duration . Meanwhile, it is obvious that the licensed channel availability at the beginning of a frame only depends on the licensed channel availability state obtained in the preceding frame. In addition, the probabilities of sensing false alarm and sensing miss detection are unchanged for each frame. Note that the miss detection of spectrum sensing can be found by pilot symbols. Therefore, the miss detection of spectrum sensing does not affect the change of sensing result and the sensing result of licensed channel can be molded as an ergodic finite state discrete time Markov chain with state space . Furthermore, according to Definitions 1 and 2, the state transition probabilities of can be given as:
2.3. Rate and Power Adaptation Model
We consider a block fading model to characterize the parallel narrowband channels, that is, these channels keep constant during each frame. It is well known that block fading channel can be modeled as an ergodic first order finite state discrete time Markov channel (FSMC) . Let be a sequence of pre-selected thresholds of received SNR and denote the channel state space of th channel for CR user . The probability distribution of state space can be given as , , where is the average SNR. Therefore, the state transition probabilities of the th channel are
where , and is maximal Doppler frequency of CR user . For CR users , the composite state space of channels is denoted by , and . Correspondingly, the composite channel state is defined as , where . If it is assumed that the state transition probabilities between each pair of channels are independent  and the transition probability of is given by .
In the proposed system, adaptive transmission scheme based on M-ary quadrature amplitude modulation (M-QAM) is employed for each channel. Let denote the transmission rate space of each channel, in which is corresponding to -QAM transmission. Specifically, 0 and 1 are corresponding to no transmission and BPSK transmission, respectively. For given transmission rate, power, and channel state, the bit error rate (BER) can be estimated. Assuming ideal coherent detection, BER bound for is given by ,
3. Problem Formulation and Discussion
In this section, we consider the cross-layer policy design of the channel access and transmission rate adaptation where each CR user aims at maximizing expected long-term average reward under power constraint. We define the reward as the number of packets successfully transmitted to base station per frame. The policy design will be formulated and discussed in both centralized and decentralized cases.
3.1. Preliminary Definition
To model the stochastic characteristics of the CR networks considered in this paper, we first provide the definition of , where is state space and , , , are action space, state transition probability matrix, reward, and cost, respectively.
We can define a composite CR network state space instinctively, where the notation " " denotes the Cartesian product. Let . Then, can be further expressed as . The state transition probabilities of depend on transition probabilities of composite channel state and sensing result. We assume that sate transition only occurs at the beginning of each frame. Let denote the whole CR network state at the beginning of a frame. The CR network evolves into a new state at the beginning of the next frame. In this paper, it is assumed that the transition probabilities of composite channel state and channel sensing result are independent of each other [8, 25]. We can express the transition probability as
where is the sensing result at the beginning of th frame, and is the number of transmitted packets in a frame. is a linear increasing function and is defined as for simplification in this paper. If another CR user also accesses channel , the collision occurs and the reward on the channel will be 0. We also assume that the packet loss due to transmission failure is only determined by the collision with licensed users when the BER is small enough. is the probability that the licensed channel is idle during when sensing result of CR user is .
3.2. Centralized Optimization
This problem can be formulated as a special Constrained Markov Decision Process (CMDP). That is, the state transition probabilities of the CMDP proposed in this section are not affected by action. According to [26, Theorem ], the standard LP approach for this problem can be formulated as
3.3. Decentralized Case
In decentralized case, each CR user does not know any information except the probability of sensing false alarm , steady state probabilities , and the power threshold . We define as the class of decentralized policies. As mentioned in the previous section, the CR users aim at maximizing the total reward of the whole CR network. It means that all CR users have the common maximizing object
Assume policy u maximizing (13) satisfies the power constraints but is not a constrained Nash equilibrium. According to Definition 3, there must exist a CR user and the policy , such that the power constraint of this CR user holds and . Furthermore, based on (8), the power constraints of all CR users can be satisfied by the policy . This result contradicts with the assumption that the policy u maximizes (13). Therefore, we conclude that is a constrained Nash equilibrium.
where is the occupation measure of local state and local action of CR user . Each CR user solves the same optimization problem (16) and the corresponding optimal policy can be obtained similarly as (12). The solution must be a constrained Nash equilibrium and a globally optimal policy in decentralized case.
4. Simplification of Policy Design
With rates, channel states, channels, and CR users , the LP (11) has variables and the LP (17) has variables. The variables increase exponentially both in centralized and decentralized cases. Consequently, it is impossible to design the optimal policy in real-time in response to evolving parameters ( and ). In this section, we simplify the LP (11) and (17) by reducing the variable number in two ways without loss of the optimality of the policy. One way is to transform the multichannel policy design to single channel policy design, while the other reduces action set and aggregates states. For the former method, we have the following theorem.
See Appendix .
In the case of single channel, the action set is , where means that the CR user does not access this channel and means that there is no data transmission. In fact, the access action can be obtained in the transmission rate state . Therefore, the action set can be further reduced and the reduced action set is defined as . On the other hand, any state is aggregated into a macro-state by state aggregation. Consequently, we have the following propositions.
Based on Theorem 2, for the optimal policy design on each channel, the new action and state space of CR user are respectively expressed as and . The new action and state space of the whole CR network are respectively expressed as and .
After the simplification of policy design, the variable number of centralized policy is reduced from to and the variable number of decentralized policy is reduced from to . Obviously, the complexity of the optimal policy design is largely reduced.
5. Pure Policy
It is noticed that the previous discussion on the optimal policy is based on the mixed policy. In fact, the controller prefers pure policy to mixed policy due to the convenience of implementation and evaluation. For pure policy, the action selection is not stochastic but only one action could be adopted for each state. The optimal pure policy exists for centralized case and the LP for the optimal pure policy can be formulated as
For decentralized case, we have the following theorems.
In decentralized case, the constrained Nash equilibrium exists under the condition of pure polices.
Under the condition of pure policy, the state space and action space are finite and countable. Therefore, Theorem 1 still holds and the constrained Nash equilibrium exists under the condition of pure polices.
In decentralized case, Theorem 2 does not hold under the condition of pure polices.
Note that under the condition of pure policy, the state space and action space are finite and countable. If the policy design is solved separately in every channel, the state space and action space actually are reduced and the optimality is not guaranteed.
6. Parameter Estimation
6.1. Fixed Finite Set
6.2. No Prior Information
7. Numerical Results
In this section, we present numerical results to evaluate the performances of the proposed policies in both centralized and decentralized cases. In the numerical computation, the number of CR users is set to be . Traffic load parameters of licensed channel are given by ms and ms. The length of frame ms and the spectrum sensing false alarm is set to be 0.1 for each CR user. Licensed channel is divided into narrow channels. To illustrate the influence of on the performances of pure policies, we consider 3 different cases: , , and . Each channel has states, and dB, dB. Noise power and target BER are set to be mW, , respectively. Average SNR and maximal Doppler frequency of each channel are dB and Hz, respectively. We adopt four modulation schemes which are BPSK, 4-QAM, 8-QAM, and 16-QAM. Then we have .
Different with the centralized case, the sensing false alarm affects the performance obviously. Due to the sensing false alarm, the throughput threshold of pure polices is less than the threshold , which can only be reached by pure polices in perfect spectrum sensing case. For mixed polices, the throughput thresholds are the same. But in generally, the performance of imperfect sensing is less than that of the perfect sensing case.
Figure 5 shows the change of the estimated parameters pair with the increase of estimation frame number . As expected, the convergence of is observed and coincides with Theorem 5.
8. Conclusions and Future Works
In this paper, we consider cross-layer design of joint channel access and transmission rate adaptation in CR networks in both centralized and decentralized cases. In centralized case, the cross-layer design can be solved by formulated as a special CMDP, and the optimal policy can be achieved. In decentralized case, we prove the existence of the constrained Nash equilibrium and characterize the structure of optimal decentralized policy. We point out that in both the centralized and decentralized cases, the complexity of finding optimal policy increases exponentially with the size of action space and state space, which incurs so-called curse of dimensionality. Therefore, we apply action set reduction and state aggregation to reduce the complexity without loss of optimality. We further prove that under certain condition, the multichannel access and transmission rate adaptation policy design can be solved separately with respect to every channel without loss of optimality. Furthermore, the pure polices are investigated and compared with the mixed polices. Finally, under the condition that the traffic load parameters of the licensed user are unknown for the CR users, we provide two different methods to estimate the parameters in two different cases.
In the future, we will extend our work to finite buffer with stochastic packet arrival and also concern the learning mechanism for the unknown CR environments.
The authors would like to thank Prof. Xuesong Tan for his valuable suggestions and help in preparing this paper and the two anonymous reviewers for their very helpful suggestions and comments. This work is supported in part by High-Tech Research and Development Program of China under Grant no. 2007AA01Z209, 2009AA011801, and 2009AA012002, National Fundamental Research Program of China under Grant A1420080150, and National Basic Research Program (973 Program) of China under Grant no. 2009CB320405, Nation Grand Special Science and Technology Project of China under Grant no. 2008ZX03005-001, National Natural Science Foundation of China under Grant no. 60702073, 60972029, and Special Project on Broadband Wireless Access sponsored by Huawei co., LTD.
- Cabric D, O'Donnell ID, Chen MS-W, Brodersen RW: Spectrum sharing radios. IEEE Circuits and Systems Magazine 2006, 6(2):30-45. 10.1109/MCAS.2006.1648988View ArticleGoogle Scholar
- Devroye N, Mitran P, Tarokh V: Limits on communications in a cognitive radio channel. IEEE Communications Magazine 2006, 44(6):44-49.View ArticleMATHGoogle Scholar
- Akyildiz IF, Lee W-Y, Vuran MC, Mohanty S: NeXt generation/dynamic spectrum access/cognitive radio wireless networks: a survey. Computer Networks 2006, 50(13):2127-2159. 10.1016/j.comnet.2006.05.001View ArticleMATHGoogle Scholar
- Mitola J III, Maguire GQ Jr.: Cognitive radio: making software radios more personal. IEEE Personal Communications 1999, 6(4):13-18. 10.1109/98.788210View ArticleGoogle Scholar
- Haykin S: Cognitive radio: brain-empowered wireless communications. IEEE Journal on Selected Areas in Communications 2005, 23(2):201-220.View ArticleGoogle Scholar
- Zhao Q, Sadler BM: A survey of dynamic spectrum access. IEEE Signal Processing Magazine 2007, 24(3):79-89.View ArticleGoogle Scholar
- Zhao Q, Tong L, Swami A, Chen Y: Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: a POMDP framework. IEEE Journal on Selected Areas in Communications 2007, 25(3):589-600.View ArticleGoogle Scholar
- Chen Y, Zhao Q, Swami A: Distributed spectrum sensing and access in cognitive radio networks with energy constraint. IEEE Transactions on Signal Processing 2009, 57(2):783-797.MathSciNetView ArticleGoogle Scholar
- Kim H, Shin KG: Efficient discovery of spectrum opportunities with MAC-layer sensing in cognitive radio networks. IEEE Transactions on Mobile Computing 2008, 7(5):533-545.MathSciNetView ArticleGoogle Scholar
- Chen Y, Zhao Q, Swami A: Joint design and separation principle for opportunistic spectrum access in the presence of sensing errors. IEEE Transactions on Information Theory 2008, 54(5):2053-2071.MathSciNetView ArticleMATHGoogle Scholar
- Pei Y, Hoang AT, Liang Y-C: Sensing-throughput tradeoff in cognitive radio networks: how frequently should spectrum sensing be carried out? Proceedings of the 18th Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC '07), September 2007, Athens, Greece 1-5.Google Scholar
- Liang Y-C, Zeng Y, Peh ECY, Hoang AT: Sensing-throughput tradeoff for cognitive radio networks. IEEE Transactions on Wireless Communications 2008, 7(4):1326-1337.View ArticleGoogle Scholar
- Ghasemi A, Sousa ES: Optimization of spectrum sensing for opportunistic spectrum access in cognitive radio networks. Proceedings of the 4th Annual IEEE Consumer Communications and Networking Conference (CCNC '07), January 2007, Las Vegas, Nev, USA 1022-1026.Google Scholar
- Liu H, Krishnamachari B, Zhao Q: Cooperation and learning in multiuser opportunistic spectrum access. Proceedings of the IEEE International Conference on Communications (ICC '08), May 2008, Beijing, China 487-492.Google Scholar
- Liang Z, Liu W, Zhou P, Gao F: Randomized multi-user strategy for spectrum sharing in opportunistic spectrum access network. Proceedings of IEEE International Conference on Communications (ICC '08), May 2008, Beijing, China 477-481.Google Scholar
- Liu K, Zhao Q, Chen Y: Distributed sensing and access in cognitive radio networks. Proceedings of the 10th IEEE International Symposium on Spread Spectrum Techniques and Applications (ISSSTA '08), August 2008, Bologna, Italy 23-27.Google Scholar
- Liu H, Krishnamachari B: Randomized strategies for multi-user multi-channel opportunity sensing. Proceedings of IEEE Workshop on Cognitive Radio Networks (CCNC '08), January 2008Google Scholar
- Altman E, Avratchenkov K, Bonneau N, Debbah M, El-Azouzi R, Menasché DS: Constrained stochastic games in wireless networks. Proceedings of the 50th Annual IEEE Global Telecommunications Conference (GLOBECOM '07), November 2007, Washington, DC, USA 315-320.Google Scholar
- Altman E, Avrachenkov K, Miller G, Prabhu B: Discrete power control: cooperative and non-cooperative optimization. Proceedings of the 26th IEEE International Conference on Computer Communications (INFOCOM '07), May 2007, Anchorage, Alaska, USA 37-45.Google Scholar
- Chung ST, Goldsmith AJ: Degrees of freedom in adaptive modulation: a unified view. IEEE Transactions on Communications 2001, 49(9):1561-1571. 10.1109/26.950343View ArticleMATHGoogle Scholar
- Barlow RE, Hunter LC: Reliability analysis of a one-unit system. Operations Research 1961, 9(2):200-208. 10.1287/opre.9.2.200MathSciNetView ArticleMATHGoogle Scholar
- Baxter LA: Availability measures for a two-state system. Journal of Applied Probability 1981, 18: 227-235. 10.2307/3213182MathSciNetView ArticleMATHGoogle Scholar
- Wang HS, Moayeri N: Finite-state Markov channel—a useful model for radio communication channels. IEEE Transactions on Vehicular Technology 1995, 44(1):163-171. 10.1109/25.350282View ArticleGoogle Scholar
- Hossain MdJ, Djonin DV, Bhargava VK: Delay limited optimal and suboptimal power and bit loading algorithms for OFDM systems over correlated fading channels. Proceedings of IEEE Global Telecommunications Conference (GLOBECOM '05), November-December 2005, St. Louis, Mo, USA 5: 2787-2792.Google Scholar
- Karmokar AK, Djonin DV, Bhargava VK: Optimal and suboptimal packet scheduling over time-varying flat fading channels. IEEE Transactions on Wireless Communications 2006, 5(2):446-457.View ArticleGoogle Scholar
- Altman E: Constrained Markov Decision Process: Stochastic Modeling. Chapman & Hall/CRC, London, UK; 1999.MATHGoogle Scholar
- Rosen JB: Existence and uniqueness of equilibrium points for concave N-person games. Econometrica 1965, 33(3):520-534. 10.2307/1911749MathSciNetView ArticleMATHGoogle Scholar
- Mandi P: Estimation and control in Markov chains. Advances in Applied Probability 1974, 6: 40-60. 10.2307/1426206MathSciNetView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.