Opportunistic spectrum access in self similar primary traffic

We take a stochastic control approach to opportunity tracking and access in self-similar primary traffic. Based on a multiple time scale hierarchical Markovian model, we formulate opportunity tracking and access in self-similar primary traffic as a Partially Observable Markov Decision Process. We show that for independent and stochastically identical channels under certain conditions, the myopic sensing policy has a simple and robust structure that obviates the need to know the channel parameters. Furthermore, the myopic policy achieves comparable performance as the optimal policy that requires exponential complexity and assumes full knowledge of the channel model.


A. Opportunistic Spectrum Access
The "spectrum paradox" is by now widely recognized.On the one hand, the projected spectrum need for wireless devices and services continues to grow, and virtually all usable radio frequencies have already been allocated.Such an imbalance in supply and demand threatens one of the most explosive economic and technological growths in the past decades.On the other hand, extensive measurements conducted in recent years reveal that much of the prized spectrum lies unused at any given time and location [1].For example, in a recent measurement study of wireless LAN traffic [2], a typical active FTP session has about 75% idle time, and voice-over-IP applications such as Skype have up to 90% idle time.
These measurements of actual spectrum usage highlight the drawbacks of the current static spectrum allot-°This work was supported by the Anny Research Laboratory CTA on Communication and Networks under Grant DAADI9-01-2-0011 and by the National Science Foundation under Grants CNS-0627090 and CCF-0830685.
978-1-4244-2677-5/08/$25.00 @2008 IEEE ment policy that is at the root of this spectrum paradox.They also form the key rationale for Opportunistic Spectrum Access (OSA) envisioned by the DARPA XG program and currently being considered by the FCC [3].The idea of OSA is to exploit instantaneous spectrum opportunities by opening the licensed spectrum to secondary users.This would allow secondary users to identify available spectrum resources and communicate non-intrusively by limiting interference to primary users.Even for unlicensed bands, OSA may be of considerable value for spectrum efficiency by adopting a hierarchical pricing structure to support both subscribers and opportunistic users.

B. Opportunistic Spectrum Access in Self Similar Primary Traffic
Since the seminal work of Leland, Taqqu, Willinger, and Wilson [4], extensive studies have shown that selfsimilarity manifests in communications traffic in diverse contexts, from local area networks to wide area networks, from wired to wireless applications [5].In this paper, we consider opportunistic spectrum access in self similar primary traffic processes with long range dependency.We adopt a multiple time scale hierarchical Markovian model for self similar traffic processes proposed in [6], [7].A decision theoretic framework is developed based on the theory of Partially Observable Markov Decision Processes (POMDP).
Unfortunately, solving a general POMDP is often intractable due to the exponential complexity.A simple approach is to implement the myopic policy, which only focuses on maximizing the immediate reward and ignores the impact of current action on the future reward.We show in this paper that the myopic policy has a simple and robust structure under certain conditions.This simple structure obviates the need to know the transition probabilities of the underlying multiple time scale Markovian model and allows automatic tracking of variations in the primary traffic model.The strong performance of the myopic policy with such a simple and robust structure is demonstrated through simulation examples.

C. Related Work
This paper is perhaps the first that addresses GSA in self similar primary traffic.It builds upon our prior work on a POMDP framework for the joint design of opportunistic spectrum access that adopts a first-order Markovian model for the primary traffic.Specifically, in [8], [9], a decision-theoretic framework for tracking and exploiting spectrum opportunities is developed using a first-order Markovian model for the primary traffic.A fundamental result on the principle of separation for OSA [10] and structural opportunity tracking polices [11] have been established, leading to simple, robust, and optimal solutions.
The first-order Markovian model of the primary traffic, however, has its limitations.It cannot capture the long range dependency exhibited in a wide range of communications traffic.In this paper, we extend the decisiontheoretic framework developed in [8], [10] to incorporate self similar primary traffic with long range dependency.We show that the structure and optimality of the myopic sensing policy established in [11] under a first-order Markovian model are preserved under certain conditions in self similar primary traffic modeled by a multiple time scale hierarchical Markovian process.

II. A MULTIPLE TIME SCALE HIERARCHICAL MARKOVIAN MODEL FOR SELF SIMILAR TRAFFIC
A fundamental property of a self similar process is the "scale-invariant behavior:" the process is stochastically unchanged when it is zoomed out by stretching the time domain [12].Specifically, {X (t) : tEn} is a selfsimilar process if for any k 2: 1, tl, ... ,tk E R, and a~H E n+, (1) where ~represents equivalence in distribution.It has been shown that for ~< H < 1, the autocorrelation of a self-similarly process decreases to zero polynomially, leading to a long range dependency behavior.
Based on traffic traces from physical networks, several models for self similar traffic have been developed, among which is a multiple time scale hierarchical Markovian model proposed in [6], [7].Under this model, traffic is an aggregation of hierarchical Markovian onoff processes with disparate time scales.Illustrated in Fig. 1 is a two-level hierarchical on-off process.The higher level process has a much slower transition rate than the lower one.The resulting traffic process is "on" (busy) when both Markovian processes are in state 0 and "off" (idle) otherwise.This hierarchical model with two to three levels has been shown to approximate a self similar process and fit well with measured traffic traces.It is motivated by the physical process of traffic generation [6], [7].Specifically, for a packet to appear in the physical channel, several events at different time scales have to occur, including, for example, establishing a session, releasing a message to the network by a transport protocol like TCP, then releasing a packet to the channel by the MAC and physical layers [6], [7].This hierarchical on-off process can be described by a Markov process with augmented state.For example, the above two-level hierarchical on-off process can be treated as a Markov process with 4 states.The resulting traffic model is thus a hidden Markov model: the state (0,0) is directly observable and mapped to "on", and the remaining 3 states are mapped to a single state "off'.This hidden Markovian interpretation is the key to our POMDP formulation of opportunity tracking and exploitation in self-similar primary traffic as shown in the next section.

III. A POMDP FRAMEWORK
In this section, we show that under the multiple time scale hierarchical Markovian model, opportunity tracking can still be formulated as a POMDP similar to that developed in [8]- [10] under a first-order Markovian model.

A. Network Model
Consider a spectrum consisting of N channels, each with transmission rate B n (n == 1,'" ,N).These N (6) (3) The reward in each slot is the number of bits that can be delivered by the secondary user.Given sensing action a (t ), the immediate reward R a (t) (t) is given by Due to the hidden Markovian model of channel availability and partial sensing, the state Sn (t) of the augmented Markov chain representing each channel cannot be fully observed.The statistical infonnation on Sn (t) provided by the entire decision and observation history can be encapsulated in a belief vector An (t) == where represents the conditional probability (given the decision and observation history) that the states of the k-th level Markov process for channel n is 1 in slot t.Notice that An (t) is sufficient to represent the conditional probability distribution of Sn (t) due to the independence across the Markov processes at different levels.The whole system state is given by the concatenation of each channel's belief vector:

B. POMDP Formulation
The sequential decision-making process described above can be modeled as a POMDP.
Specifically, the underlying system state is given by the state of the augmented Markov chain at the beginning of each slot.Let Sn(t) == (S~l) (t), S~2) (t), ... ,S~L)(t)) denote the state of channel n in slot t, where S~k) (t) E {a, I} represents the state of the k-th level Markov process for channel n in slot t.The transition probabilities of this augmented Markov chain can be easily obtained from {p~;,k)h,j=O,l (1 ~k ~L).Let Gn(t) E {O, I} denote the availability of channel n in slot t, i.e., On(t) == 0 (busy) when S~k)(t) == 0 for alII S k S L and On(t) == 1 (idle/opportunity) otherwise.channels are allocated to a primary network with slotted transmissions.The primary traffic in each channel is a self-similar process following the hierarchical Markovian model with L levels.Each channel can thus be represented by an augmented Markov chain with 2 L states (see Fig. 1 above where L == 2).The availability (idle or busy) of a channel, i.e., the primary traffic trace, is detennined by the state of the corresponding augmented Markov chain.
other words, the k-th level Markov process varies much slower than the m-th level Markov process for m > k.
It can be shown that the k-th level Markov frocess for channel n is positively correlated when pi~' ) > p~~,k), and negatively correlated when pi~,k) < p~~,k).We notice that the Markov processes at higher levels (i.e., with small level indexes) can be considered as positively correlated due to their slow transition rates.Consider next a pair of secondary transmitter and receiver seeking spectrum opportunities in these N chan- nels.In each slot, they choose a channel to sense.If This system belief vector A(t) is a sufficient statistic for the channel is idle, the transmitter sends packages to making the optimal action in each slot t.Furthennore, the receiver through this channel, and a reward R(t) is A(t + 1) for slot t + 1 can be obtained from A(t), a(t), accrued in this slot (i.e., the number of bits delivered).It
Our goal is to develop the optimal sensing policy to (4) maximize the throughput of the secondary user during a Let 1r == { 1r t}f=1 be a series of mappings from A(t) desired period of T slots.to a(t) for each 1 S t S T, which denotes the sensing policy for channel selection.We then arrive at the following stochastic control problem.
where En represents the expectation given that the sensing policy 1r is employed, 1rt (A (t)) is the sensing action in slot t under policy 1r, and A(1) is the initial belief vector.When no infonnation is available to the secondary user at the beginning of the first slot, A(I) is given by the stationary distributions ofthe on-off Markov processes at all levels of these N channels.Specifically, A~k) (1) is given by

STRUCTURE
The myopic policy ignores the impact of the current action on the future reward, focusing solely on maximizing the expected immediate reward JE[Ra(t) (t)].The myopic action a(t) in slot t given current belief vector A(t) is thus given by k=l Proposition 1 (Relaxation of Initial Condition): The round robin structure of the myopic policy given in Theorem 1 remain unchanged when for any two channels i and j with A?) (1) ~A;1) (1), the following two equations hold: Fig. 2 shows an example of the round robin structure of the myopic policy when N == 3 with a circular channel order of (1,2,3).The myopic action is to sense the three channels in tum with random switching times (when the current channel is busy).
We notice that the secondary user usually has no initial information about the channel availability.In this case, the initial system belief vector is given by the stationary distributions of the underlying Markov processes as given in (6).The condition on the initial system belief vector in Theorem 1 is thus satisfied since stochastically identical channels have the same stationary distribution at the same level.The circular channel ordering in the round-robin structure can be set arbitrarily.
For two-level hierarchical Markovian channel models (L == 2), we can relax the conditions in Theorem 1 without affecting the round robin structure of the myopic policy.Recall that the higher level Markov process for a two-level hierarchical Markovian channel can be considered as positively correlated.By considering both positively and negatively correlated lower lever Markov processes, we have the following two propositions which provide, respectively, the relaxed condition on the initial system belief vector and that on the positive correlation of the lower level Markov process.
For positively correlated lower level Markov processes, Proposition 1 only requires that the initial belief vectors of all channels are properly ordered.

•
We notice that if the initial belief values have the same channel ordering at all levels as required in Theorem 1, then the conditions in Proposition 1 are trivially satisfied.Ch 3 when observe 0 In general, obtaining the myopic action in each slot requires the recursive update of the belief vector A(t) as given in (4).Next we show that under certain conditions the myopic policy for stochastically identical channelh as a semi-universal structure that does not need the update of the belief vector or the knowledge of the transition probabilities.
Consider stochastically identical channels Let {piJ) h,j=O,1 denote the transition probabilities ~f the k-th level Markov process for all channels, B the transmission rate of all channels.We establish a simple and robust structure of the myopic policy under certain conditions as shown in Theorem 1 below.
Theorem 1: Suppose that the Markov processes at all levels are positively correlated.Furthermore, the initial system belief vector A( 1) satisfies the following condition: there e~ists a channel ordering (11,1, 11,2, . .. ,11, N ) such that A~l) (1) ~A~~) (1) ~... ~A~~~(1) for all 1 S k S L, i.e., the channel ordering by the initial belief values at all levels is the same.The myopic policy has a round robin structure based on the circular channel ordering (n/1,n'2,'" ,n'N): Starting from sensing channel n 1 in slot 1, the myopic action is to stay in the same channel when it is idle and switch to the next channel in the circular ordering when it is busy.
Proof• Omitted due to space limit.
when observe 0 When the initial system belief vector is given by the stationary distribution of the underlying Markov processes, Proposition 2 allows the scenarios that the lower level Markov processes are negatively correlated.We notice that the correlation of the lower level Markov processes cannot be too negative, i.e., p~;) -pi;) should be fairly upper bounded.
Theorem 1, Proposition 1, and Proposition 2 show that the myopic policy is a round-robin scheme (See Fig. 2 where N == 3) for stochastically identical channels under certain conditions.This semi-universal structure leads to robustness against model mismatch and variations.processes.We have shown that for independent and stochastically identical channels under certain conditions, the myopic policy has a simple and robust structure which yields a strong performance.Future work includes investigating the optimality and throughput limits of the myopic policy for stochastically identical channels, and extending the policy design to hierarchical Markovian channel model in general network scenarios.for t > 4, p~;) == 0.9 and p~;) == 0.8.).Proof• Omitted due to space limit.

V. SIMULATION EXAMPLES
In this section, we illustrate the performance and robustness of the myopic policy for stochastically identical channels.
In figure 3, the system belief vector starts from the stationary distributions of the underlying Markov processes.For this example, the conditions in Theorem 1 are satisfied and the myopic policy obeys a round robin structure.We observe that the myopic policy achieves identical perfonnance as the optimal policy that requires exponential complexity and assumes full knowledge of the transition probabilities at all levels of the hierarchical channel model.Figure 4 shows an example that the myopic policy can automatically track model variations.The transition probabilities in this example change abruptly at the fifth slot, which corresponds to a drop in the primary traffic load.It can be shown that these variations will not affect the round robin structure of the myopic policy as long as the conditions in Theorem 1 are satisfied.From this figure, we can observe from the change in the throughput increasing rate that the myopic policy effectively tracks the traffic model variations in the primary system.

Fig. 1 .
Fig. 1.A multiple time scale hierarchical Markovian model for self similar primary traffic