From: Transfer restless multi-armed bandit policy for energy-efficient heterogeneous cellular network
Symbol | Meaning |
---|---|
\(\mathcal {Y}\) | Set of BS \(\mathcal {Y} = \{1,\cdots,Y\}\) |
\(\mathcal {Y}^{on}_{n}\) | Set of active (ON) BS at the nth iteration |
\(\mathcal {I}_{k}(n)\) | Cell coverage of BS k at time n |
x _{ k} | Denote the locations of the MS in coverage \(\mathcal {I}_{k}(n)\) of the kth BS at time n |
Λ(x_{k},n) | Traffic arrival rate at location x_{k} in BS k following a Poisson point process at the nth iteration |
1/h(x_{k},n) | Average call duration (or file size) at nth iteration at x_{k} |
L_{k}(n) | Instantaneous traffic load served by the BS \(k \in \mathcal {Y}_{n}^{on}\) |
Θ_{k}(x_{k},n) | Service rate at location x_{k} from BS k at the nth iteration |
SINR_{k}(x_{k},n) | Received SINR at active MS location x_{k} from BS k at the nth iteration |
ρ_{k}(n) | System load of BS k at the nth iteration |
ρ _{ th} | System load threshold |
\(P^{tx}_{k}, P_{f}^{k}\) and \(P_{T}^{k}\) | Transmit, fixed and total operational power of BS k |
EE(n) | Network energy efficiency (EE) in bits per joule |
\(\Theta ^{\min }\) | Prescribed minimum data rate to continue data transmission |
\(\mathcal {M}=<\mathcal {S},\mathcal {K},\mathcal {P},R>\) | MDP Tuple: state space, action space, transition probability and reward |
\(\mathcal {P}^{i} = \{P^{i}_{k,l}, k,l \in \mathcal {S}\}\) | State transition probabilities of the ith action |
\(\mathcal {A}(n)\) | \(\mathbf {a}^{i}(n) = \left [a^{i}_{1}(n), \cdots, a^{i}_{Y}(n)\right ]\) the controller decides an action for all BS, i.e., ON or OFF |
T^{i}(n) | Total number of times action i has been selected up to iteration n |
b(n) | Total number of completed blocks up to iteration n |
n _{2} | Total number of iterations in SB2 block up to block b(n) |
\(T_{2}^{i}\) | Total number of times action i has been selected during SB2 block up to n_{2} iteration |
h _{2} | Total number of iterations in historic period |
\(H^{i}_{2}\) | Total number of times action i has been selected in SB2 block in historic period |
S^{i}(n) and S^{i,h}(n) | State observed due to action i at the nth iteration, in the current and source task, respectively, |
\(R^{i}_{S}(n)\) and \(R^{i,h}_{S}(n)\) | Immediate EE reward with action i in state S at nth iteration in the current and source task respectively |
\(G_{S}^{i}(T^{i}(n))\) | \(\frac {1}{T^{i}(n)}\sum _{k=1}^{T^{i}(n)} R^{i}_{S}(k)\), the empirical mean of EE rewards |
\(G^{S}_{\max }(n)\) | \(\max _{i \in \mathcal {K}} G^{i}_{S}(T^{i}(n))\), maximum expected average EE |
M^{i}(n,T^{i}(n)) | \(G^{S}_{\max }(n) - G^{i}_{S}(T^{i}(n))\) |
B^{i}(n,T^{i}(n)) | EEM-UCB policy index giving the BS configuration status to activate |
α and β | Exploration coefficients with respect to state and reward, respectively |
ζ ^{i} | State that determines regenerative cycles for action i |
\(\pi ^{i}_{S}\) | Stationary distribution for state S of the Markov chain associated with action i |
μ ^{i} | \(\sum _{S \in \mathcal {S}} S^{i} G^{i}_{S} \pi ^{i}_{S}\), global mean reward, i.e., taking into account the reward as well as the state of each action i |
Δ μ ^{i} | μ^{∗}−μ^{i} |
\(\hat \pi _{S}^{i}\), \(\hat \pi _{\max }\), \(\pi ^{i}_{\min }\), \(\pi _{\min }\) | \(\max \left \{\pi ^{i}_{S}, 1 - \pi ^{i}_{S}\right \}\), \(\max _{S \in \mathcal {S},i\in \mathcal {K}} \hat \pi ^{i}_{S}\), \(\min _{S \in \mathcal {S}} \pi ^{i}_{S}\), \(\min _{i\in \mathcal {K}} \pi ^{i}_{\min }\) |
\(r_{\max }\) | \(\max _{S \in \mathcal {S}, i \in \mathcal {K}} r^{i}_{S}\) |
\(S_{\max }\) | \(\max _{i\in \mathcal {K}} |S^{i}| \), where \(\left | S^{i} \right |\) stands for the cardinality of the state space of action i |
\(M_{\min (\max)}\) | \(\min (\max \text { resp.})_{i \in \mathcal {K}} M^{i}\left (n_{2}, T_{2}^{i}(n_{2}) \right)\) |
ε^{i}, \(\varepsilon _{\min }\) | \(1 - \lambda _{2}^{i}\), being the eigenvalue gap of the ith action, \(\min _{i\in \mathcal {K}} \epsilon ^{i}\) |
\(\Omega ^{i}_{k,l}\) | Mean hitting time of state l starting from an initial state k for the ith action |
\(\Omega ^{i}_{\max }\), \(\Omega _{\max }\) | \(\max _{k,l \in \mathcal {S}, k \neq l} \Omega ^{i}_{k,l}\), \(\max _{i \in \mathcal {K}} \Omega ^{i}_{\max }\) |