Skip to main content


Table 1 List of the main symbols in the paper

From: Transfer restless multi-armed bandit policy for energy-efficient heterogeneous cellular network

Symbol Meaning
\(\mathcal {Y}\) Set of BS \(\mathcal {Y} = \{1,\cdots,Y\}\)
\(\mathcal {Y}^{on}_{n}\) Set of active (ON) BS at the nth iteration
\(\mathcal {I}_{k}(n)\) Cell coverage of BS k at time n
x k Denote the locations of the MS in coverage \(\mathcal {I}_{k}(n)\) of the kth BS at time n
Λ(xk,n) Traffic arrival rate at location xk in BS k following a Poisson point process at the nth iteration
1/h(xk,n) Average call duration (or file size) at nth iteration at xk
Lk(n) Instantaneous traffic load served by the BS \(k \in \mathcal {Y}_{n}^{on}\)
Θk(xk,n) Service rate at location xk from BS k at the nth iteration
SINRk(xk,n) Received SINR at active MS location xk from BS k at the nth iteration
ρk(n) System load of BS k at the nth iteration
ρ th System load threshold
\(P^{tx}_{k}, P_{f}^{k}\) and \(P_{T}^{k}\) Transmit, fixed and total operational power of BS k
EE(n) Network energy efficiency (EE) in bits per joule
\(\Theta ^{\min }\) Prescribed minimum data rate to continue data transmission
\(\mathcal {M}=<\mathcal {S},\mathcal {K},\mathcal {P},R>\) MDP Tuple: state space, action space, transition probability and reward
\(\mathcal {P}^{i} = \{P^{i}_{k,l}, k,l \in \mathcal {S}\}\) State transition probabilities of the ith action
\(\mathcal {A}(n)\) \(\mathbf {a}^{i}(n) = \left [a^{i}_{1}(n), \cdots, a^{i}_{Y}(n)\right ]\) the controller decides an action for all BS, i.e., ON or OFF
Ti(n) Total number of times action i has been selected up to iteration n
b(n) Total number of completed blocks up to iteration n
n 2 Total number of iterations in SB2 block up to block b(n)
\(T_{2}^{i}\) Total number of times action i has been selected during SB2 block up to n2 iteration
h 2 Total number of iterations in historic period
\(H^{i}_{2}\) Total number of times action i has been selected in SB2 block in historic period
Si(n) and Si,h(n) State observed due to action i at the nth iteration, in the current and source task, respectively,
\(R^{i}_{S}(n)\) and \(R^{i,h}_{S}(n)\) Immediate EE reward with action i in state S at nth iteration in the current and source task respectively
\(G_{S}^{i}(T^{i}(n))\) \(\frac {1}{T^{i}(n)}\sum _{k=1}^{T^{i}(n)} R^{i}_{S}(k)\), the empirical mean of EE rewards
\(G^{S}_{\max }(n)\) \(\max _{i \in \mathcal {K}} G^{i}_{S}(T^{i}(n))\), maximum expected average EE
Mi(n,Ti(n)) \(G^{S}_{\max }(n) - G^{i}_{S}(T^{i}(n))\)
Bi(n,Ti(n)) EEM-UCB policy index giving the BS configuration status to activate
α and β Exploration coefficients with respect to state and reward, respectively
ζ i State that determines regenerative cycles for action i
\(\pi ^{i}_{S}\) Stationary distribution for state S of the Markov chain associated with action i
μ i \(\sum _{S \in \mathcal {S}} S^{i} G^{i}_{S} \pi ^{i}_{S}\), global mean reward, i.e., taking into account the reward as well as the state of each action i
Δ μ i μμi
\(\hat \pi _{S}^{i}\), \(\hat \pi _{\max }\), \(\pi ^{i}_{\min }\), \(\pi _{\min }\) \(\max \left \{\pi ^{i}_{S}, 1 - \pi ^{i}_{S}\right \}\), \(\max _{S \in \mathcal {S},i\in \mathcal {K}} \hat \pi ^{i}_{S}\), \(\min _{S \in \mathcal {S}} \pi ^{i}_{S}\), \(\min _{i\in \mathcal {K}} \pi ^{i}_{\min }\)
\(r_{\max }\) \(\max _{S \in \mathcal {S}, i \in \mathcal {K}} r^{i}_{S}\)
\(S_{\max }\) \(\max _{i\in \mathcal {K}} |S^{i}| \), where \(\left | S^{i} \right |\) stands for the cardinality of the state space of action i
\(M_{\min (\max)}\) \(\min (\max \text { resp.})_{i \in \mathcal {K}} M^{i}\left (n_{2}, T_{2}^{i}(n_{2}) \right)\)
εi, \(\varepsilon _{\min }\) \(1 - \lambda _{2}^{i}\), being the eigenvalue gap of the ith action, \(\min _{i\in \mathcal {K}} \epsilon ^{i}\)
\(\Omega ^{i}_{k,l}\) Mean hitting time of state l starting from an initial state k for the ith action
\(\Omega ^{i}_{\max }\), \(\Omega _{\max }\) \(\max _{k,l \in \mathcal {S}, k \neq l} \Omega ^{i}_{k,l}\), \(\max _{i \in \mathcal {K}} \Omega ^{i}_{\max }\)