Skip to main content

DQN-based resource allocation for NOMA-MEC-aided multi-source data stream


This paper investigates a non-orthogonal multiple access (NOMA)-aided mobile edge computing (MEC) network with multiple sources and one computing access point (CAP), in which NOMA technology is applied to transmit multi-source data streams to CAP for computing. To measure the performance of the considered NOMA-aided MEC network, we first design the system cost as a linear weighting function of energy consumption and delay under the NOMA-aided MEC network. Moreover, we propose a deep Q network (DQN)-based offloading strategy to minimize the system cost by jointly optimizing the offloading ratio and transmission power allocation. Finally, we design experiments to demonstrate the effectiveness of the proposed strategy. Specifically, the designed strategy can decrease the system cost by about 15% compared with local computing when the number of sources is 5.

1 Introduction

In the current society, with the advancement of wireless communication technology [1,2,3,4], the quantity of mobile device sources has skyrocketed, which results in the exponential growth of data to be handled [5,6,7]. However, the local computing capability of the device is often overwhelmed by the huge computing data stream, leading to the slow processing of data streams. To deal with this issue, the cloud server is used to assist devices in computing data stream [8,9,10], due to its advantages of much computing capacity compared to the local device. However, too many data streams are offloaded to the cloud, which may also bring a serious workload to the cloud server [11]. In addition, the wireless channel is vulnerable, which prolongs the communication delay and affects the system’s performance.

Based on the above local computing and cloud server problem, mobile edge computing (MEC) is designed to help to compute data stream [12,13,14]. In the MEC network, multi-source data streams can be partially offloaded to the computing access point (CAP) to be computed [15]. Because the local device also has computing power, the local and cloud can perform data stream computing at the same time. Therefore, the offloading ratio of data streams becomes a key factor affecting the computing time. The authors in [16] presented an intelligent particle swarm optimization (PSO)-based policy for MEC network unloading based on the cache mechanism, which employed the PSO algorithm to search for a suitable unloading ratio to achieve partial unloading. The PSO algorithm converged quickly and the algorithm was simple, but if the function had multiple local extrema, it was easily trapped in local extrema and cannot be got the optimal solution. The authors of [9, 17] studied a multi-user multi-CAP MEC network with task offloading, where the network environment was time-varying, and the system cost was mainly determined by energy consumption and delay. Besides, a dynamic unloading policy based on DQN was devised. Users could dynamically adjust the unloading ratio to optimize the system cost to ensure the system performance of the MEC network.

Despite the above foundation, the MEC network with dynamical offloading still faces inherent limitations. Limited communication resources are difficult to support the orthogonal multiple access (OMA) of massive users. To deal with this issue, non-orthogonal multiple-access (NOMA), which emerges as a new access technology, can help support the massive users. NOMA is a promising technology for reducing delay and energy consumption in MEC networks. The technology uses non-orthogonal transmission at the transmitter with the allocated source transmission power, introduces scrambling code information, and then removes the scrambling code information at the CAP through successive interference cancellation (SIC) to achieve correct demodulation [18]. Multiple sources share the same bandwidth to send the data stream simultaneously, and the CAP receives the transmitted information and then decodes it. This has a clear advantage in terms of increasing the transmission rate of the data stream [19, 20]. According to this principle, multiple sources can use the same bandwidth to offload data streams to the CAP simultaneously to decrease system energy consumption and delay.

So far, there has been a large number of investigations on the resource allocation of the NOMA-MEC system. For example, the authors in [21] used reinforcement learning to optimize the computation and cache of the multi-server NOMA-MEC system. The author of [22] studied a computing unloading system with the help of NOMA and dual connection (DC) and employed the deep learning-based intelligent unloading method to reduce the total system consumption. The authors in [23] designed a secure communication strategy for the NOMA-assisted UAV-MEC system for large-scale access users. The author of [24] considered a NOMA-MEC network, and jointly optimized the total system energy consumption through the convex theory and the iterative algorithm. Many current studies focus on optimizing the unloading ratio of the MEC network, but when NOMA technology is used to assist the unloading, the transmission power in the unloading stage also has a huge impact on the energy consumption and delay of the system.

On this basis, the author studied multi-source MEC networks where CAPs are deployed at the edge. In the networks, computational data can be offloaded to adjacent CAPs through favorable data stream division and offloading to reach low-energy consumption and low delay [5, 25, 26]. Meanwhile, an optimization strategy based on DQN is proposed for data stream offloading. A deep learning-based Q-learning algorithm integrates neural network techniques and value function approximation [27]. The neural network is trained by using target networks and empirical replay methods. The system cost is designed for a linear weighting function of delay and energy consumption. The offloading decision on MEC networks is modeled as the Markov decision process to employ reinforcement learning methods for resource allocation to improve network performance and reduce system cost [3, 7, 10]. Finally, the designed scheme is verified to be significantly superior through simulation experiments. The significant contributions of the paper are listed in the following:

  • We consider a NOMA-aided MEC network with S sources and one CAP. Based on this, we propose a linear combination of energy consumption and delay in the system cost to measure the considered network performance. Meanwhile, the offload ratio and transmission power ratio are jointly optimized to lower the cost of the system.

  • We come up with a DQN-based data stream unloading optimization policy. In practical applications, the network environment is dynamic, which increases the optimization difficulty. Therefore, we use this strategy to dynamically obtain the allocation of the unloading ratio and transmission power ratio of the system.

  • We design experiments to compare different schemes, and the simulation experiment results indicate that the designed DQN-based strategy has a lower total system cost than other methods.

We have organized the rest of the paper as shown below. Following the introduction, we discuss the offloading model of the considered NOMA-aided MEC network in Sect.  and give the relevant calculation formula and model optimization formula of the system. After the discussion of the system model, the devised DQN-based method is shown in Sect. . Section  presents the results of the simulation experiments. Finally, in Sect. , we conclude the whole work.

2 System model

As Fig. 1 shows, we explore the MEC network with S sources and one CAP, where the NOMA technology is applied to assist multi-source data streams for transmission. Data streams are partially processed at local sources with limited computing capability and the other part of data streams are unloaded to CAP to be computed. Considering the performance of the MEC network, we use NOMA to help offload data streams to the CAP with sufficient computing capability through wireless links to accelerate computing. Concretely, the data stream sets of sources in the network are denoted by \(\left\{ D_s| 1 \le s \le S \right\}\). Each source has different data streams \(D_s\) which has \(q_s\) number of bits and offloads a part of the data stream to the server to be computed. After calculating the offloaded data stream, the CAP returns the results to the source via dedicated feedback links. The following sections present the data stream offloading model, the local computing model, and the CAP computing model, respectively, specified in the following.

Fig. 1
figure 1

A NOMA-aided MEC Network with multiple sources and a CAP

2.1 Data stream offloading model

In this part, we describe the data stream offloading model. When part of the data stream is offloaded to the CAP to be computed, multiple sources need to transmit the offloaded stream over the radio link using NOMA. The transmission rate of source \(D_s\) can be described as

$$\begin{aligned} \begin{array}{l} r_s = B\log _2(1+\frac{P_s |h_s|^2}{\sum _{n=1}^{s-1}P_n|h_n|^2 +\sigma ^2}), \end{array} \end{aligned}$$

where B is the bandwidth of wireless channel from \(D_s\) to CAP, \(P_s\) is the transmission power of source \(D_s\), and \(|h_{s}|^2\) is the channel gain of wireless channel from the source \(D_s\) to CAP. The symbol \(\sigma ^2\) stands for the noise of AWGN [28,29,30,31]. As previously stated, we assume \(P_{1}|h_1|^2 \le P_{2}|h_2|^2 \le \cdots \le P_{s}|h_s|^2\).

At each time slot, sources \(D_s\) have \(q_s\) bits of the data stream that need to be processed and offload a part of the data stream to CAP through the wireless link. The transmission delay of the offloaded data stream of sources \(D_s\) is [17, 32]

$$\begin{aligned} t_s =\frac{\beta _s q_s}{r_s}, \end{aligned}$$

where \(\varvec{\beta _{s}} = [\beta _1,\beta _2,\ldots ,\beta _s]\) represents the percentage of source \(D_s\) to be unloaded to server which satisfies \(\beta _s \in [0,1]\). Since multi-source data streams are unloaded in parallel, the total delay in the unloading phase is

$$\begin{aligned} T_1 = \max \left\{ {t_1,t_2,\ldots ,t_S} \right\}. \end{aligned}$$

In addition, the system energy consumption in the unloading phase can be obtained by

$$\begin{aligned} E_1 = \sum _{s = 1}^{S} t_s P_s. \end{aligned}$$

2.2 Local computing model

As mentioned above, some multi-source data streams can be calculated locally. The local calculation delay of source \(D_s\) can express as

$$\begin{aligned} t_{\rm local} ^s = \frac{(1-\beta _s)c_s}{f_s}, \end{aligned}$$

where \(f_s\) refers to local computing capability on source \(D_s\) and \(c_s\) represents the CPU cycle required for processing multi-source data stream on source \(D_s\). The total local calculated time is

$$\begin{aligned} T_2 = \max \left\{ {t_{\rm local}^1,t_{\rm local}^2,\ldots ,t_{\rm local}^S} \right\}. \end{aligned}$$

The total local calculated energy consumption is

$$\begin{aligned} E_2 = \sum _{s = 1}^{S} t_{\rm local} ^s P_{\rm local} ^s. \end{aligned}$$

2.3 CAP computing model

After part of the data stream on source \(D_s\) is successfully unloaded to the CAP through wireless links, the offloaded data stream will be computed at CAP. The computation delay at the CAP of source \(D_s\) is

$$\begin{aligned} t_{\rm MEC} ^s = \frac{\beta _s c_s}{F_s}, \end{aligned}$$

where \(F_s\) denotes the computational power allocated to each source of the CAP. Different offloaded data streams are calculated in parallel at the CAP, so the total calculation delay at the CAP is

$$\begin{aligned} T_3 = \max \left\{ {t_{\rm MEC}^1,t_{\rm MEC}^2,\ldots ,t_{\rm MEC}^s} \right\}. \end{aligned}$$

Meanwhile, the energy consumption produced at the CAP is

$$\begin{aligned} E_3 = \sum _{s = 1}^{S} t_{\rm MEC} ^s P_{\rm MEC} ^s. \end{aligned}$$

Since local operation and unloading can be performed at the same time, the total calculation delay of multi-source data streams is

$$\begin{aligned} T_{\rm total} = \max \left\{ {T_2,T_1+T_3} \right\}. \end{aligned}$$

Moreover, the total energy consumption equation is

$$\begin{aligned} E_{\rm total} = E_1 + E_2 + E_3. \end{aligned}$$

The total system energy consumption is the sum of energy consumption produced at each stage which is related to the delay at this stage. However, it is hard to measure the system behavior only by the total system energy consumption, because the total delay is the maximum of the local delay and the sum of delays produced at other stages. To comprehensively measure system performance, we devise the total system cost as a linear weighted function of total delay and total energy consumption [33], which can be described as

$$\begin{aligned} \Theta _{s} = \mu T_{\rm total} + (1-\mu )E_{\rm total},\end{aligned}$$

where \(\mu \in\) [0,1] is a weight factor. Notice that, when \(\mu\) = 0, the system cost consists directly of the system energy consumption, and we focus on the impact of system energy consumption on the considered system. When \(\mu\) = 1, the system cost consists only of the system delay, and we pay attention to the impact of delay on the considered MEC network. In the considered system, we can change the value of \(\mu\) to meet the requirements of different scenarios on energy consumption and time.

2.4 Problem formulation

As mentioned before, the system cost can be used to measure the system’s performance. To improve the system performance, the data stream needed to be processed with the minimum system cost. Therefore, we formulate this problem as minimizing the system cost by optimizing the unloading data stream and transmission power allocation, which can be expressed as

$$\begin{aligned} \underset{\left\{ \beta _{s}, \alpha _{s}\right\} }{\min } \Theta _{s} \\ \text{ s.t. }&C_{1}: \beta _{s} \in [0,1], \forall s \in [1, S], \\&C_{2}: \alpha _{s} \in (0, 1], P_{s} = \alpha _{s}P_{\max }^{s}, \end{aligned}$$

where \(C_1\) is a constraint on the unloading ratio, representing the limitation of the portion of the data stream unloaded to the CAP. Constraint \(C_2\) proposes the constraint of transmission power allocation, where \(P_{\max }^{s}\) represents the maximum transmission power. The \(\alpha _{s}\) is the transmission power allocation ratio between the sth source and the CAP. Due to the complexity of this problem, we employ DQN to solve this problem which is introduced in the next section. We summarize the notations mentioned in this section in Table 1.

Table 1 Symbol notations

3 DQN-based offloading policy

The unloading policy determines the part of the data stream unloaded to the CAP and the transmission power allocation on the CAP, which significantly affects the performance of the system. In this section, we first formulate the optimization problem as a Markov decision process(MDP). Then we investigate a DQN-based optimization scheme to obtain the offload ratio and transmission power ratio to minimize the system cost.

In MDP, at time slot \(\tau\), the state of the environment is \(s_\tau\). The agents first obtain the action \(a_\tau\) from the policy \(\pi _\tau\) according to \(s_\tau\). Then the agents perform action \(a_\tau\) at the environment, resulting in the environment shift from \(s_\tau\) to \(s_{\tau +1}\) and obtaining the reward \(r_\tau\) from the environment. Concretely, the state space is

$$\begin{aligned} {\varvec{s}}=\left\{ \varvec{\beta },\varvec{\alpha } \right\}, \end{aligned}$$

where \(\varvec{\beta }=\left\{ \beta _1,\beta _2,\beta _3,\ldots ,\beta _s \right\}\) is the unloading ratio of the multi-source data stream, \(\varvec{\alpha }=\left\{ \alpha _1,\alpha _2,\alpha _3,\ldots ,\alpha _s \right\}\) is the transmission power allocation ratio at CAP. Besides, the action space is \({\varvec{A}}=\left\{ \delta _1,\delta _1^*,\delta _2,\delta _2^*,\delta _3,\delta _3^*,\ldots ,\delta _s,\delta _s^*, \varrho _1,\varrho _1^*,\varrho _2,\varrho _2^*,\varrho _3,\varrho _3^*,\ldots ,\varrho _s,\varrho _s^* \right\}\), where \(\delta _s=-\theta\) and \(\delta _s^*=+\theta\) are the acts of adjusting the unloading ratio under the constraints \(C_1\), and \(\varrho _s=-\theta\) and \(\varrho _s^*=+\theta\) are the actions to adapt the transmit power allocation ratio with the constraint \(C_2\). For minimizing the system cost of the considered NOMA-aided MEC system, the reward is designed as

$$\begin{aligned} r_{\tau } = \left\{ \begin{array}{ll} -\gamma _{1} &{} If \,\,\, \Theta (\tau ) > \Theta (\tau +1),\\ -\gamma _{2} &{} If \,\,\, \Theta (\tau ) = \Theta (\tau +1),\\ \gamma _{1} &{} If \,\,\, \Theta (\tau ) < \Theta (\tau +1), \end{array}\right. \end{aligned}$$

where \(\gamma _1> \gamma _2 >0\). Notice that, if the execution of \(a_\tau\) results in the reduction of system cost of the environment in time slot \(\tau +1\), the agents obtain a positive reward. On the contrary, the reward is negative. Moreover, the Q function which measures the performance of the action in the current environment is used in obtaining the best policy. It can be expressed as

$$\begin{aligned} \pi ^{*}=\arg \max _{\pi } Q_{\pi }(s, a).\end{aligned}$$

As mentioned above, the optimization problem of the considered NOMA-aided MEC network can be modeled as MDP. Therefore, we employ a deep learning method, DQN, to solve this problem.

As shown in Fig. 2, the DQN consists of two networks and one replay memory. The replay memory is used to store transition samples (\(s_\tau , a_\tau , r_\tau ,s_{\tau +1}\)). The evaluation network outputs the action \(a _ \tau \in {\varvec{A}}\) with the input state \(s _ \tau \in {\varvec{s}}\). When the replay memory has enough data, the evaluation network begins training and updates weights \(\vartheta\) every step. The target network, which helps train the evaluation network, is initialed as an evaluation network at the beginning and updated as the evaluation network at every certain step.

Fig. 2
figure 2

DQN-based offload policy framework

To avoid the network falling into local optimization, \(\varepsilon\)-greedy strategy is used to help the agents to explore. It can be expressed as

$$\begin{aligned} a_{\tau }=\left\{ \begin{array}{ll} \arg \max _{a \in {\varvec{A}}} Q\left( s_{\tau }, a;\vartheta \right) , &{} \text{ with } \text{ probability } 1-\epsilon ,\\ \text{ randomly } \text{ choose } , &{} {\text{ otherwise }} , \end{array}\right.\end{aligned}$$

where \(\vartheta\) is the weight of the evaluation network. We employ the temporal difference (TD) approach to support training DQN by defining the TD-target obtained by the target network [33].

$$\begin{aligned} \begin{aligned} Q\left( s_{\tau }, a_{\tau }; \vartheta \right) =r_{\tau }+\varphi \max _{a \in {\varvec{A}}}\left( Q\left( s_{\tau +1}, a; \vartheta \right) \right) , \end{aligned} \end{aligned}$$

where \(\varphi\) is weight factor.

Moreover, we give the loss function [2, 34] based on TD-target as

$$\begin{aligned} \begin{aligned} L_{\tau }=\left( \left( r_{\tau }+\varphi \max _{a \in {\varvec{A}}}\left( Q\left( s_{\tau +1}, a; \hat{\vartheta }\right) \right) \right) -Q\left( s_{\tau }, a_{\tau }; \vartheta \right) \right) ^{2}. \end{aligned} \end{aligned}$$

Based on the above, we summarize the DQN-based unloading policy in Algorithm 1

figure a

4 Results and discussion

This section demonstrates the advantage of the designed DQN-based unloading strategy in the considered NOMA-aided MEC network through simulations. In the considered MEC network, with all channels experiencing Rayleigh flat fading [35,36,37]. We set the number of sources to 5 in our experiment. The size of multi-source data streams is set to 14 Mb, 3 Mb, 16 Mb, 8 Mb, and 18 Mb, and the computing power of each source is set to \(5 \times (10^7)\) cycle/s. Besides, the maximum transmission power and calculating power of each source are 1 W and 1.5 W. The computing power allocated to each source at the CAP is \(8 \times (10^7)\) cycle/s, and the calculating power is 1.5 W. In addition, the total bandwidth is set to 6 MHz. The detailed network parameter settings are shown in Table 2.

Table 2 Parameter setting

Figure 3 shows the reward of each episode during the training of the proposed DQN-based offloading policy, where \(\mu\) = 0.5 and the DQN has trained 300 episodes at all. From this figure, we can see that the reward grows rapidly in the previous 20 episodes. After about 50 episodes, the reward fluctuates slightly around 1950. It shows that our training converges, and helps to verify the efficacy of the designed DQN-based policy.

Fig. 3
figure 3

The total reward of each episode during training of designed policy

Figure 4 plots system cost versus the number of training episodes under the three scenarios, where the number of sources is set to 5 and we train 150 episodes at all. For comparison, we also plot the cost of the other two scenarios. One is the local computation scenario, where each source data stream is computed locally; the other is the full offloading scenario, where data streams on each source are all offloaded to CAP for computing and the computing capability allocation on CAP is obtained by DQN. From this plot, we could notice that the system cost of the designed strategy declines sharply during the previous 100 episodes and converges at about 28.30 after 100 episodes. The system cost of the full offloading scenario decreases gradually during the previous 20 episodes and converges at about 31.6 after 20 episodes. On the contrary, the system cost of the local computation scenarios remains at 33.30 during the whole training. This result illustrates that the designed strategy has the best performance in reducing the system cost among the three proposed solutions. And the designed strategy can give a good unloading and resource allocation strategy for the considered NOMA-aided MEC network.

Fig. 4
figure 4

The convergence of the designed strategy versus episode

Figure 5 shows the relationship between the weighting factor \(\mu\) and the system cost \(\Theta\), where \(\mu\) varies from 0.1 to 0.9 and the number of sources is set to 5. We can see from Fig. 5 that the system cost obtained by the designed strategy is lower than that of the local computation and full offloading schemes with different values of \(\mu\). This indicates that the proposed scheme can significantly and efficiently improve the performance of the considered NOMA-aided MEC network by reasonably allocating resources and unloading rate. Moreover, the total system cost of these three schemes decreases when \(\mu\) varies from 0.1 to 0.9. The main reason for the reduction of system cost is that the system cost is a linearly weighted function of time and energy consumption, where the impact of energy consumption is greater than the delay. The increase in \(\mu\) magnifies the impact of energy consumption, thus making the total system cost significantly lower.

Fig. 5
figure 5

Impact of the weight factor \(\mu\) on the system cost

Figure 6 presents the influence of wireless bandwidth B on system cost, where \(S=5\) and the value of bandwidth varies from 5 to 9 MHz. As shown in Fig. 6, the system cost of the designed strategy is lower than that of other schemes under the different values of bandwidth. This result indicates that our designed strategy outperforms other schemes under different communication environments. Moreover, the system cost of the designed scheme and the full offloading decrease, and the system cost of the local computation solutions stays at the same value as bandwidths increase. The reason is that the bigger bandwidth reduces the unloading cost of the designed policy and the full offloading scheme, while the data stream is calculated locally in the local computation scheme resulting in non-unloading cost.

Fig. 6
figure 6

System cost comparison of three schemes under the different values of bandwidth

Figure 7 presents the impact on the system cost \(\Theta\) of the variation of the computing power allocated to each source of the CAP in three scenarios, where the number of sources is 5 and the computing power allocated to each source at the CAP changes from \(6 \times 10^7\) cycle/s to \(10 \times 10^7\) cycle/s. As shown in this figure, the system cost of the designed strategy and full offloading gradually decreases when the computing power allocated to each source at the CAP increases from \(6 \times 10^7\) cycle/s to \(10 \times 10^7\) cycle/s. The reason is that the CAP with more computational power can compute data streams faster, resulting in reduced system energy consumption and delay. Moreover, the system cost of the full offloading tends to decrease faster than that of the designed strategy. This is because the designed strategy is more robust in the various MEC environment and the full offloading solution is more impacted by the computational power at the CAP. This implies that the designed strategy obviously superior to the full offloading and local computation schemes.

Fig. 7
figure 7

The influence of the computing power allocated to each source of the CAP on system cost

Fig. 8
figure 8

The influence of the number of sources on the system cost

Figure 8 plots the system cost of the designed strategy versus the scale of the MEC network, where S changes from 2 to 6 and the local computing capability is set to \(5 \times 10^7\) cycle/s. From this plot, we can notice that the system cost of the designed strategy is always lower than that of local computing schemes and full offloading schemes as the value of S rises. It indicates that the designed strategy can improve the performance of considered NOMA-aided MEC networks with different scales. Moreover, the system cost of the three schemes rises as the value of S varies from 2 to 6. This is due to the increasing number of sources leading to the computing data streams increasing for local and CAP. It leads to system delay grows.

5 Conclusion

This paper has investigated a NOMA-aided MEC network with multi-source and one CAP, in which multi-source data streams were partially offloaded to CAP to accelerate computing. In the considered NOMA-aided MEC network, we designed the system cost as the linear weighting function of energy consumption and delay produced in the unloading process. For reducing the cost of the considered network, we proposed the DQN-based offloading strategy for minimizing the system cost by optimizing the transmission power ratio and the offloading ratio during the unloading process. We compared the proposed DQN-based offloading strategy and other methods through experiments. The experimental results showed that the proposed method was more effective than other methods under different communication environments with various bandwidths. Moreover, in different NOMA-aided MEC networks with different scales, different transmission capabilities, different bandwidth, or different computing capabilities, the proposed DQN-based offloading strategy is more robust than other methods with minimum system cost. Specifically, the designed strategy has been able to decrease the system cost by about \(15\%\) compared with local computing when the number of sources is 5.

Availability and data materials

The authors state the data available in this manuscript.



Mobile edge computing


Non-orthogonal multiple access


Deep Q network


Computing access point


Particle swarm optimization


Successive interference cancellation


Dual connection


Unmanned aerial vehicle


Markov decision process


Temporal difference


Additive white Gaussian noise


  1. W. Wu, F. Zhou, R.Q. Hu, B. Wang, Energy-efficient resource allocation for secure noma-enabled mobile edge computing networks. IEEE Trans. Commun. 68(1), 493–505 (2020)

    Article  Google Scholar 

  2. L. Chen, X. Lei, Relay-assisted federated edge learning: performance analysis and system optimization. IEEE Trans. Commun. PP(99), 1–12 (2022)

    Google Scholar 

  3. R. Zhao, M. Tang, Profit maximization in cache-aided intelligent computing networks. Phys. Commun. PP(99), 1–10 (2022)

    Google Scholar 

  4. J. Ren, X. Lei, Z. Peng, X. Tang, O.A. Dobre, Ris-assisted cooperative NOMA with SWIPT. IEEE Wirel. Commun. Lett. (2023)

  5. X. Liu, C. Sun, M. Zhou, C. Wu, B. Peng, P. Li, Reinforcement learning-based multislot double-threshold spectrum sensing with Bayesian fusion for industrial big spectrum data. IEEE Trans. Ind. Inform. 17(5), 3391–3400 (2021)

    Article  Google Scholar 

  6. Z. Na, B. Li, X. Liu, J. Wan, M. Zhang, Y. Liu, B. Mao, Uav-based wide-area internet of things: An integrated deployment architecture. IEEE Netw. 35(5), 122–128 (2021)

    Article  Google Scholar 

  7. W. Zhou, F. Zhou, Profit maximization for cache-enabled vehicular mobile edge computing networks. IEEE Trans. Veh. Technol. PP(99), 1–6 (2023)

    Google Scholar 

  8. W. Xu, Z. Yang, D.W.K. Ng, M. Levorato, Y.C. Eldar, M. Debbah, Edge learning for B5G networks with distributed signal processing: Semantic communication, edge computing, and wireless sensing. IEEE J. Sel. Top. Signal Process. arXiv:2206.00422 (2023)

  9. X. Zheng, C. Gao, Intelligent computing for WPT-MEC aided multi-source data stream. to appear in EURASIP J. Adv. Signal Process. 2023(1) (2023)

  10. S. Tang, L. Chen, Computational intelligence and deep learning for next-generation edge-enabled industrial IoT. IEEE Trans. Netw. Sci. Eng. 9(3), 105–117 (2022)

    Google Scholar 

  11. W. Wu, F. Zhou, B. Wang, Q. Wu, C. Dong, R.Q. Hu, Unmanned aerial vehicle swarm-enabled edge computing: potentials, promising technologies, and challenges. IEEE Wirel. Commun. 29(4), 78–85 (2022)

    Article  Google Scholar 

  12. W. Zhou, X. Lei, Priority-aware resource scheduling for uav-mounted mobile edge computing networks. IEEE Trans. Veh. Technol. PP(99), 1–6 (2023)

    Google Scholar 

  13. L. Zhang, C. Gao, Deep reinforcement learning based IRS-assisted mobile edge computing under physical-layer security. Phys. Commun. 55, 101896 (2022)

    Article  Google Scholar 

  14. L. Chen, Physical-layer security on mobile edge computing for emerging cyber physical systems. Comput. Commun. 194(1), 180–188 (2022)

    Article  Google Scholar 

  15. Y. Wu, C. Gao, Task offloading for vehicular edge computing with imperfect CSI: a deep reinforcement approach. Phys. Commun. 55, 101867 (2022)

    Article  Google Scholar 

  16. W. Zhou, L. Chen, S. Tang, L. Lai, J. Xia, F. Zhou, L. Fan, Offloading strategy with PSO for mobile edge computing based on cache mechanism. Clust. Comput. 25(4), 2389–2401 (2022)

    Article  Google Scholar 

  17. R. Zhao, C. Fan, J. Ou, D. Fan, J. Ou, M. Tang, Impact of direct links on intelligent reflect surface-aided mec networks. Phys. Commun. 55, 101905 (2022)

    Article  Google Scholar 

  18. Z. Ding, D.W.K. Ng, R. Schober, H.V. Poor, Delay minimization for NOMA-MEC offloading. IEEE Signal Process. Lett. 25(12), 1875–1879 (2018)

    Article  Google Scholar 

  19. X. Liu, Q. Sun, W. Lu, C. Wu, H. Ding, Big-data-based intelligent spectrum sensing for heterogeneous spectrum communications in 5g. IEEE Wirel. Commun. 27(5), 67–73 (2020)

    Article  Google Scholar 

  20. Z. Na, Y. Liu, J. Shi, C. Liu, Z. Gao, Uav-supported clustered NOMA for 6g-enabled internet of things: Trajectory planning and resource allocation. IEEE Internet Things J. 8(20), 15041–15048 (2021)

    Article  Google Scholar 

  21. S. Li, B. Li, W. Zhao, Joint optimization of caching and computation in multi-server NOMA-MEC system via reinforcement learning. IEEE Access 8, 112762–112771 (2020)

    Article  Google Scholar 

  22. C. Li, H. Wang, R. Song, Intelligent offloading for noma-assisted MEC via dual connectivity. IEEE Internet Things J. 8(4), 2802–2813 (2021)

    Article  Google Scholar 

  23. W. Lu, Y. Ding, Y. Gao, Y. Chen, N. Zhao, Z. Ding, A. Nallanathan, Secure noma-based UAV-MEC network towards a flying eavesdropper. IEEE Trans. Commun. 70(5), 3364–3376 (2022)

    Article  Google Scholar 

  24. L. Shi, Y. Ye, X. Chu, G. Lu, Computation energy efficiency maximization for a noma-based WPT-MEC network. IEEE Internet Things J. 8(13), 10731–10744 (2021)

    Article  Google Scholar 

  25. X. Liu, H. Ding, S. Hu, Uplink resource allocation for noma-based hybrid spectrum access in 6g-enabled cognitive internet of things. IEEE Internet Things J. 8(20), 15049–15058 (2021)

    Article  Google Scholar 

  26. X. Liu, C. Sun, W. Yu, M. Zhou, Reinforcement-learning-based dynamic spectrum access for software-defined cognitive industrial internet of things. IEEE Trans. Ind. Inform. 18(6), 4244–4253 (2022)

    Article  Google Scholar 

  27. B. Li, Z. Fei, J. Shen, X. Jiang, X. Zhong, Dynamic offloading for energy harvesting mobile edge computing: architecture, case studies, and future directions. IEEE Access 7, 79877–79886 (2019)

    Article  Google Scholar 

  28. L. He, X. Tang, Learning-based MIMO detection with dynamic spatial modulation. IEEE Trans. Cogn. Commun. Netw PP(99), 1–12 (2023)

    Google Scholar 

  29. L. Zhang, S. Tang, Scoring Aided Federated Learning on Long-tailed Data for Wireless IoMT based Healthcare System. IEEE J. Biomed. Health Inform. PP(99), 1–12 (2023)

    Google Scholar 

  30. J. Li, S. Dang, Y. Huang, Composite multiple-mode orthogonal frequency division multiplexing with index modulation. IEEE Trans. Wirel. Commun. (2023)

  31. S. Tang, X. Lei, Collaborative cache-aided relaying networks: performance evaluation and system optimization. IEEE J. Sel. Areas Commun. 41(3), 706–719 (2023)

    Article  Google Scholar 

  32. J. Lu, M. Tang, Performance analysis for IRS-assisted MEC networks with unit selection. Phys. Commun. 55, 101869 (2022)

    Article  Google Scholar 

  33. C. Li, J. Xia, F. Liu, D. Li, L. Fan, G.K. Karagiannidis, A. Nallanathan, Dynamic offloading for multiuser muti-cap MEC networks: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 70(3), 2922–2927 (2021)

    Article  Google Scholar 

  34. Y. Wu, C. Gao, Intelligent resource allocation scheme for cloud-edge-end framework aided multi-source data stream. EURASIP J. Adv. Signal Process. 2023(1) (2023, to appear)

  35. W. Zhou, C. Li, M. Hua, Worst-case robust MIMO transmission based on subgradient projection. IEEE Commun. Lett. 25(1), 239–243 (2021)

    Article  Google Scholar 

  36. J. Li, S. Dang, M. Wen, Index modulation multiple access for 6G communications: principles, applications, and challenges. IEEE Netw. (2023)

  37. S. Tang, Dilated convolution based CSI feedback compression for massive MIMO systems. IEEE Trans. Veh. Technol. 71(5), 211–216 (2022)

    MathSciNet  Google Scholar 

Download references




This work was supported by the Key-Area Research and Development Program of Guangdong Province, China (No. 2019B090904014), Science and Technology Projects in Guangzhou (No. 202102010412), and Yangcheng Scholars Research Project of Guangzhou (No. 202032832), and by Science and Technology Program of Guangzhou (No. 202201010047).

Author information

Authors and Affiliations



JL designed the proposed framework and conducted the simulations, JX assisted in revising the manuscript for structure and grammar checking, VB helped to perfect the optimization method, F. Zhu assisted to optimize the design of the deep neural network, CG aided in conducting the simulations in this work, and SL helped to interpret the simulation results in this work. JX, FZ, and CG are the corresponding authors of this paper. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Junjuan Xia, Fusheng Zhu or Chongzhi Gao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ling, J., Xia, J., Zhu, F. et al. DQN-based resource allocation for NOMA-MEC-aided multi-source data stream. EURASIP J. Adv. Signal Process. 2023, 44 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: