 Research
 Open access
 Published:
Energy efficiency performance in RISbased integrated satellite–aerial–terrestrial relay networks with deep reinforcement learning
EURASIP Journal on Advances in Signal Processing volume 2023, Article number: 121 (2023)
Abstract
Integrated satellite–aerial–terrestrial relay networks (ISATRNs) play a vital role in nextgen networks, particularly those with highaltitude platforms (HAP). This study introduces a new model for hybrid optical/RFbased HAPenabled ISATRNs, incorporating reconfigurable intelligent surfaces (RIS) on unmanned aerial vehicles (UAVs) to optimize access in dense urban areas. Nonorthogonal multiple access is employed for improved spectrum efficiency. The objective is to jointly optimize UAV trajectory, RIS phase shift, and active transmit beamforming while considering energy consumption. A deep reinforcement learning approach using LSTMDDQN framework is proposed. Numerical results show the effectiveness of our algorithm over traditional DDQN, with higher singlestep exploration reward and evaluation metrics.
1 Introduction
With the development of the tourism industry and aerial communication, aerial technology has brought great innovation to the tourism industry, which provides unique and unprecedented perspectives and experiences for tourists, and also brings new business opportunities to the tourism industry [1]. However, the existing network cannot effectively support the development of aerial technology, on this foundation, integrated satellite–aerial–terrestrial relay networks (ISATRNs) have gained significant attention as a potential infrastructure from academia and industry to meet the increasing demands for capacity and reliability, which can satisfy the requirements of the urban lives and tourism industry [2, 3]. There networks utilize highaltitudeplatforms (HAPs) and unmanned aerial vehicles (UAVs) to expand a wide service coverage area in various broadband wireless communication applications [4,5,6]. HAPs, including airships and aircraft, operate at altitudes of 20–25 km in the stratosphere, providing increased maneuverability. In contrast, unmanned aerial vehicles (UAVs) or drones operate at altitudes ranging from a few tens to approximately 100 ms above the ground. UAVs offer a favorable airtoground channel for direct lineofsight (LoS) communication with ground cellular equipment. This enables reliable wireless connectivity, especially during emergency situations when terrestrial networks are overloaded, incapacitated, or completely destroyed. By utilizing signal amplification relays, UAVs can swiftly establish communication links, ensuring seamless connectivity. The deployment of HAPs and UAVs plays a crucial role in maintaining effective wireless communication. These aerial platforms effectively bridge communication gaps, facilitating uninterrupted connectivity in challenging environments. This capability greatly contributes to efficient emergency response and enhances overall communication systems.
On the one hand, freespace optical (FSO) communication is expected to fulfill the requirements for nextgeneration ISATRNs that demand highspeed and highly secure connections. FSO technology offers several advantages, such as highbandwidth capacity, no spectrum license requirements, high immunity to interference, compatibility with radio frequency (RF) communication, and easy installation. However, FSO transmission is vulnerable to atmospheric turbulence, which can significantly reduce system performance. Additionally, because FSO communication depends on free space for signal transmission, obstacles can obstruct the optical signals, leading to nonlineofsight transmission challenges [7]. To overcome the connectivity challenges in the “lastmile,” introducing an asymmetric hybrid multihop RF/FSO transmission infrastructure is being considered as a promising solution within existing wireless systems. By leveraging the strengths of both RF and FSO technologies, this system can provide extensive coverage and highbandwidth connectivity, contributing to the construction of a robust wireless network while mitigating the limitations of FSO communications. The hybrid infrastructure can enable seamless switching between RF and FSO technologies, maximizing the benefits of each and reducing dependence on a single type of communication technology. In [8], the authors discussed the effect of different system parameters on the system outage probability and average bit error rate in UAVenabled multihop FSObased transmission systems
Considering that the downlink of ISATRNs can be severely degraded in urban environments because the channel suffers from severe penetration loss. Meanwhile, to facilitate the implementation of emerging services and industrial advancement, future wireless communication networks will offer converged services encompassing communication, perception, and computation, commonly referred to as communicationsensingcomputing. As a significant technology option for 6 G, reconfigurable intelligent surfaces (RISs) demonstrate the capability in communication, perception, and computation domains. Moreover, they hold the potential to establish an integrated system that combines communication, sensing, and computation, which has garnered substantial attention in recent years. RIS, as an extension of metasurface materials, has been proposed to be applied in the field of communication, whose overall purpose is to intelligently reconfigure the signaling wireless environment [9,10,11]. Besides, nonorthogonal multiple access (NOMA) technology makes efficient use of the available spectrum by using nonorthogonal transmission with allocated user transmit power at the transmitter side and applying successive interference cancelation (SIC) at the receiver side [12, 13]. The authors in [14] conducted the performance of downlink MIMONOMA systems with discrete phaseshifted distributed RIS support. A comprehensive framework was developed to facilitate three access modes, aiming to mitigate intracluster interference effectively. Theoretical analysis was explored to derive the outage probability of the involved cascade channel under various scenarios. Additionally, experimental simulations were performed to validate the efficacy of the proposed scheme. Inspired by the strategy of fixed multiple RISs working together, researchers have gradually shifted their attention to mobile RISs for higher degrees of freedom. A notable approach, as discussed in [15], entails the deployment of RIS on mobile UAVs. The authors provided empirical evidence to support the efficacy of this approach, exemplifying the considerable advantages offered by utilizing UAVs equipped with RIS to enhance signal reflection coverage. According to the experimental results, the mobile RIS scheme can provide more flexible reflection strategies and the application scenarios can be more generalized. In mobile multiplexed schemes that combine RIS with FSO/RF hybrid transmission techniques, the integration of NOMA within the ISATRN necessitates meticulous attention to several key factors during the signaling process [16]. These include the design of active beamforming for transmission, RIS phaseshift control, and UAV trajectory. Taking these factors into account is crucial to achieve optimal system performance in terms of signal transmission and coverage, but due to their highdimensionality and nonconvexity, they are difficult to solve using traditional iterative optimization algorithms or require significant computational resources. Simultaneously, deep reinforcement learning (DRL) techniques are demonstrating remarkable capabilities in various domains, owing to their realtime interaction with the environment. As a result, researchers are progressively shifting their focus toward leveraging DRL for the optimization of communication performance. The authors in [17] have addressed the problem of performance optimization in a scenario where UAVs move within RISassisted ISATRNs. They have formulated a multiobjective optimization problem and designed a multidimensional reward function to optimize the phase configuration objective. The experimental results indicate that this approach can be readily extended to other scenarios. To the best of our knowledge, there are no existing studies on accurate signal transmission from satellites to ground devices using RISaided ISATRNs incorporating hybrid FSO/RF modes. Furthermore, joint beamforming design using an extended DRL algorithmic framework has not been explored. These challenges motivated the focus of our research paper and contributed significantly to the advancement of this field.
In the context of mobile multiple RISs and FSO/RF hybrid scenarios, optimizing the signal transmission of ISATRNs using nonorthogonal multiple access (NOMA) requires careful consideration of several factors [16]. These include the design of active beamforming for transmission, RIS phaseshift control, and UAV trajectory planning. Taking these factors into account is crucial to achieve optimal system performance in terms of signal transmission and coverage. Deep reinforcement learning (DRL) that enables the discovery of optimal policies by learning from environmental interactions through a trialanderror process. Instead, it learns from rewards and penalties received during its interactions with the environment. The authors in [17] achieved multiobjective optimization problems by designing multidimensional vectors that corresponded to the reward function. These vectors aimed to optimize various factors, including active beamforming, passive beamforming, and UAV trajectory constraints. The authors in [18] mentioned the use of a dual deep neural network architecture in a supervised learning situation to optimize the RIS phaseshift matrix, and simulations verified the effectiveness. In [19], the authors discussed the RIS phaseshift optimization problems by using the twodelay depth deterministic (TD3)based method. Numerical results showed that the transmit power of this algorithm is essentially the same as the lower bound of the transmit power of the streaming optimization algorithm and significantly reduced the computational delay. In [20], the authors investigated the use of DRL algorithms to solve RISassisted multiuser fullduplex secure communication systems with the objective of maximizing the total secrecy rate. Through the existing literature, we can summarize that DRL can obtain satisfactory solutions through iterative interaction with and learning from dynamic environments. In addition, an appropriate neural network architecture can greatly improve the performance of DRLbased methods and speed up the convergence of neural networks.
This paper focuses on the modeling of downlink transmission in ISTRNs using a hybrid FSO/RF transmission approach. To improve the lineofsight link and enhance the service quality for ground users, we assume that the RIS is installed on the UAV’s side to provide accurate reflection services. The proposed system model involves joint optimization of the transmit beamforming and the RIS phaseshift matrix. We propose a modified DRLbased LSTMDDQN algorithm to solve this optimization problem. Specifically, the main focus of this paper is as follows.

Firstly, we present a system that enables highcapacity wireless communication from a satellite to the ground by employing a hybrid FSO/RFbased transmission mode. To address the challenges of operating in an ultradense environment, we deploy a RIS array on a UAV. This RIS array reflects the signal from HAPs back to the ground equipment, effectively mitigating the adverse effects of the highdensity environment. Additionally, the HAP utilizes NOMA technology for transmission to the ground, further enhancing the system’s efficiency.

Secondly, to tackle the challenges related to joint active beamforming, and passive beamforming in RISassisted ISATRNs, we propose the utilization of an advanced DRL framework that surpasses the limitations of conventional approaches. Specifically, we introduce an enhanced LSTMDDQN algorithm that enables the optimization of problems in both discrete and continuous environments. This algorithm provides a comprehensive solution for the joint optimization, allowing for efficient system performance.

Finally, we conduct a thorough validation analysis to showcase the superiority of our proposed optimization algorithm in the given system model. Through extensive numerical experiments, we provide compelling evidence that the DRLbased solution outperforms various benchmark solutions. Additionally, we perform a comparative analysis between the DDQN algorithm and the LSTMDDQN algorithm to evaluate their respective performances. Specifically, we observe a significant 11% improvement in the reward value when utilizing the LSTMDDQN algorithm compared to the traditional DDQN algorithm.
The subsequent sections of this paper are organized as follows. In Sect. 2, we offer a comprehensive overview of the NOMAbased system model under consideration, including a detailed description and formulation of the optimization problem. In Sect. 3, our focus shifts toward addressing the energy efficiency problem using the DRLbased algorithm. Next, in Sect. 4, we present simulation results that highlight the advantages of the extended LSTMDDQN framework. Finally, in Sect. 5, we provide concluding remarks summarizing the key findings and potential future research directions. Please refer to Table 1 for additional abbreviations used in this paper. It provides a comprehensive list of abbreviations and their corresponding meanings.
2 Modeling of proposed system and optimization problem
As shown in Fig. 1, a new downlink mixed FSO/RFenabled ISATRNs system is presented for reliable services from satellites to ground equipments (UEs). This mixed system uses FSO technology for satellitetoHAP transmission and RF for HAPtoUEs communication. The HAP utilizes NOMAenabled beamforming with a uniform linear array (ULA) composed of \(N_a\) apertures and \(N_h\) antennas to collect optical signals from the satellite. Conversely, the UAV is equipped with a single RIS and serves K \(\left( K\le {{N}_{h }} \right)\) mobile UEs [21]. Based on the threedimensional (3D) Cartesian coordinate system, the HAP position is fixed at \({{{\textbf{L}}}_{H}}={{\left[ {{x}_{H}},{{y}_{H}},{{z}_{H}} \right] }^{T}}\), while the position of each target UEs is denoted by \({{{\textbf {b}}}_k} = \left[ {{x_k},{y_k}} \right]\). The fixedwing UAV needs to move from its starting point to its destination within a total time of \(T_{t}\), divided into N time slots of duration \({\varsigma _t} = {T_{t}}/N\). Under the assumptions of block fading and constant channel conditions within each time slot, the 3D spatial position of the UAV can be represented as \({{\textbf {q}}}\left[ n \right] = {\left[ {x\left[ n \right] ,y\left[ n \right] ,z\left[ n \right] } \right] ^T},n \in N = \left\{ {0,...,N} \right\}\). Additionally, our proposed system assumes perfect hardware impairments and takes into account other related errors to simplify the analysis.
2.1 Signal transmission model
The transmission process in the proposed system can be analyzed as two separate modes. Firstly, during the first phase, the satellite uses optical communication terminals and telescopes to transmit optical signals to the HAP. The HAP then receives these signals through its optical receiving apertures and converts them into electrical RF signals. This process is essential for enabling the use of RF communication in the next transmission phase, which involves the HAP using the ULA and NOMAenabled beamforming techniques to transmit RF signals to the UEs on the ground. Thus, the converted RF signal in the \(n_{a}\)th aperture of the HAP can be denoted as
where \({P}_{S}\) denotes the satellite launch power, \({{\eta }_{{\text{oe}}}}\) is the conversion coefficient, \({x}_{S}\) represents the intensitymodulated optical signal emitted by the satellite, and \({{n}_{q}}\) can be regarded as additive white Gaussian noise (AWGN) satisfying \({{N}_{n_a}}\sim \mathcal{C}\mathcal{N}\left( 0,{{\sigma }_a^2} \right)\). In addition, \({{h}_{{\text{SH}}}}\) is the scalar channel fading coefficient from the satellite to the HAP, which models atmospheric turbulence and pointing errors in the FSO channel. Based on these parameters, the output electrical SNR of the combined signal at the HAP can be expressed as
where \(h_{{\text{EGC}}}^{{}}=\sum \nolimits _{n_a=1}^{{{N}_{a}}}{{{h}_{{\text{SH}}}}}\) represents the scalar channel fading coefficient of the receive aperture ensemble and \({{{\bar{\gamma }}}_{H}}=\left[ \left( {{P}_{S}}\eta _{{\text{oe}}}^{2} \right) /{{N}_{a}}{{n}_{a}} \right]\) is the average SNR. The SER for the FSO link can be expressed as
where \({B_F}\) denotes the FSO transmission link bandwidth.
In the second phase, the HAP uses the NOMAenabled transmission technique to transmit RF signal \({{x}_{k}}\left( t \right)\) with \(E\left[ {{\left {{x}_{k}}\left( t \right) \right }^{2}} \right] =1\) to the kth UEs at the tth time slot [22]. Recognizing that most users are in remote areas, we utilize a UAV carrying a RIS as a movable reflective platform for service provision instead of traditional groundfixed base stations [23]. We define the phaseshift matrix applied at the RIS as \(\mathbf {\Phi }\triangleq {{\left[ {{\beta }_{1}}{{e}^{j{{\theta }_{1}}}},{{\beta }_{2}}{{e}^{j{{\theta }_{2}}}},...,{{\beta }_{m}}{{e}^{j{{\theta }_{m}}}},...,{{\beta }_{M}}{{e}^{j{{\theta }_{M}}}} \right] }^{T}}\), where \({{\beta }_{m}}\in {\mathcal {A}}\) and \({{\theta }_{m}}\in \Theta\) represent the reflection amplitude and phase shift, respectively, of the mth element among the M total reflective elements [24, 25]. Let \({{{\textbf{h}}}_{{\text{HR}}}}\in {{{\mathbb {C}}}^{M\times {N}_{H}}}\) and \({{{\textbf{g}}}_{_{{\text{RG}}}}}\in {{{\mathbb {C}}}^{1\times M}}\) be defined as the channel gains from the HAP to the RIS and from the RIS to the kth IoT device, respectively. The gain of the direct transmission link from the HAP to the kth IoT device can be expressed as \({\textbf{G}}_{k}^{H}\in {{{\mathbb {C}}}^{{{N}_{H}}\times 1}}\). Then, let all involved channel gains, including the direct channel and the reflection channel, denoted as \({\textbf{h}}_{1}^{H},...,{\textbf{h}}_{K}^{H}\), where the combined channel coefficient through the RIS experienced by the kth UEs can be expressed as \({\textbf{h}}_{k}^{H}={\textbf{h}}_{{\text{HR}}}^{H}\mathbf {\Phi }{{{\textbf{g}}}_{{\text{RG}}}}+{\textbf{G}}_{k}^{H}\). Thus, the received signal can be formulated as
where \({{{\textbf{w}}}_{k}}\in {{{\mathbb {C}}}^{{{N}_{H}}\times 1}}\) denotes the transmit beamforming vector at the HAP and \({{n}_{k}}\) represents AWGN satisfying \({{{n}}_k}\sim \mathcal{C}\mathcal{N}\left( 0,\sigma _{k}^{2} \right)\). To maintain the transmit power at the HAP, the following constraint must also be imposed:
where \({P_t}\) represents the total transmit power. The signaltointerferenceplusnoise ratio (SINR) \({{{\tilde{\gamma }}} _{k,i}}\) of the signal for the kth UEs decoded at the ith UEs \(\left( i=1,...,k,...,K,k\le i \right)\) can be expressed as
Without loss of optimality, the condition for being able to perform SIC in this model is that the ith UE is able to successfully decode the signal for the kth IoT device \(\left( k\le i\le K \right)\); then, the kth IoT device’s SINR \({{{\tilde{\gamma }}} _{k}}\) can be represented as \({{{\tilde{\gamma }}}_{k}}=\min \left\{ {{{{\tilde{\gamma }}}}_{k,i}} \right\}\). In other words, when \({\tilde{\gamma }_{k}}\) is the minimal value in the \({{{\tilde{\gamma }}} _{k,i}}\) set, it is guaranteed that the signal for the kth UE can be decoded by the user at the ith UE no matter what value it takes to ensure successful implementation of SIC. Thus, the SER for the RF transmission link can be obtained as
where \({B}_{R}\) denotes the RFbased transmission mode bandwidth. In this setup, the HAP employs the decodeandforward (DF) protocol. This protocol involves the HAP buffering the received signal and subsequently forwarding it after amplification. Thus, the SER can be defined as
2.2 Channel model
1) FSObased satellitetoHAP channel model: FSO communication technology is a potential solution for downlink transmission from the satellite to the HAP, as it provides ultrahighcapacity and secure transmission capabilities over long distances. In contrast to traditional optical receivers that rely on singleaperture solutions, the HAP considered in this paper is equipped with multiple receive apertures, denoted as \(N_a\). The \(N_a\) FSO channels can be modeled using various parameters, including the channel attenuation coefficient, pointing error, and link distance, which can be expressed as
where \({h_l} = \frac{1}{2}\left( {{G_T} + {G_R}  {A_{{\text{FS}}}}  {A_{{\text{ATM}}}}  {L_{{\text{loss}}}}  {M_S}} \right)\), with \({G_T}\), \({G_R}\), \({A_{{\text{FS}}}}\), \({A_{{\text{ATM}}}}\), \({L_{{\text{loss}}}}\), and \({M_S}\) being the transmit antenna gain, receive antenna gain, freespace loss, atmospheric attenuation, lenses loss, and system margin, respectively. The Gamma–Gamma distribution is commonly used to represent the fading parameter \({h_a}\) in an FSO link affected by atmospheric turbulence. It is specifically designed to capture the impact of turbulence on optical wave propagation. This statistical model is preferred over other turbulence models because it incorporates both largescale fading coefficients and smallscale fading factors, which are correlated with atmospheric channel characteristics. Unlike alternative models such as lognormal and exponential distributions [26], the Gamma–Gamma distribution offers several advantages. It accurately describes the intensity of atmospheric turbulence and predicts fluctuations in optical intensity under various turbulence conditions, including both strong and weak turbulence. This makes it a suitable choice for modeling the atmospheric channel. Additionally, by estimating the parameters of the Gamma–Gamma distribution, we can obtain valuable information about the atmospheric channel state eaily.
2) RFbased RISassisted HAPtoIoT device channel model: To optimize the performance of multiuser communication systems that employ RIS, a wide range of innovative communication technologies and techniques must be integrated. By incorporating advanced approaches, we can maximize the system’s overall performance and fully leverage its capabilities. Some of these approaches include beamforming design, optimal resource allocation, and user grouping scheduling, all of which are reliant on accurate channel state information (CSI). However, the presence of reflective elements in the RIS introduces complexities in channel estimation, as simultaneous estimation is required for several channels, including the direct link between the HAP and each ground UE, the HAPtoRIS channel, and the RIStoUEs channel. Accurate channel estimation in multiuser scenarios is challenging but crucial. With a suitable channel estimation technique, we can assume that the global CSI is known for reception purposes, making it easier to implement effective optimization strategies for resource allocation, and user scheduling. Ultimately, these strategies can enhance the performance of RISassisted multiuser communication systems. The HAPtoUEs direct channel vector follows the Nakagamim distribution and can be expressed as
where \({{g}_{H,k}}\) is the random variable with channel fading severity level. \({{C}_{H,k}}\) represents the loss component between the HAP and the UEs, which can be computed as
where \({G}_{H}\) and \({R}_{k}\) are denoted as the transmit antenna gain and the UEs receive antenna gain, respectively. \(\varpi\) denotes the path loss coefficient, \(\lambda _F\) denotes the FSO channel wavelength, and \({D}_{H,k}\) represents the signal transmission distance from the HAP to the kth UEs.
In addition, \({{\textbf {A}}}\left( {{\varphi _h},{\theta _h}} \right)\) denotes the HAPtoUEs array steering matrix, and \({{\varphi }_{h}}\in \left[ 0,2/\pi \right)\) and \({{\theta }_{h}}\in \left[ 0,2\pi \right)\) are the elevation and azimuth angles, respectively. \({{\textbf {A}}}\left( {{\varphi _h},{\theta _h}} \right)\) can be expressed as
where \({{{\textbf {a}}}_x}\left( {{\varphi _h},{\theta _h}} \right)\) and \({{{\textbf {a}}}_y}\left( {{\varphi _h},{\theta _h}} \right)\) denote the horizontal and vertical components of the antenna steering vector, respectively, which can be expressed as
where \(d_v\) and \(d_h\) represent the physical spacings between adjacent elements of the antenna array in the x and yaxis directions, respectively, and \(N_1\) and \(N_2\) denote the numbers of transmit antennas in the horizontal and vertical directions, respectively, which satisfy \({N_1} \times {N_2} = {N_H}\). By substituting Eqs. (13) and (14) into Eq. (12), we can derive the ith component as follows:
where \({i_1} = \left( {2\pi {d_v}/\lambda } \right) \left( {{n_1}  \left( {{N_1} + 1} \right) /2} \right)\), \({n_1} = i/{N_1}\), \({{i}_{2}}=\left( 2\pi {{d}_{h}}/\lambda \right) \left( {{n}_{2}}\left( {{N}_{2}}+1 \right) /2 \right)\), and \({{n}_{2}}=i\left( {{n}_{1}}1 \right) {{N}_{1}}\).
Similarly, the HAPtoRIS channel gain vector can be computed as
where \({\chi _{{\text{RD}}}}\) denotes the path loss value at a reference distance and the distance between the HAP and the RIS can be denoted as \({d_{{\text{HR}}}} = \sqrt{{{\left\ {{{{\textbf {L}}}_H}  {{\textbf {q}}}\left[ n \right] } \right\ }^2}}\). \({{{\bar{{\textbf {h}}}}_{{\text{HR}}}}}\) denotes the deterministic LoS link component; \({{{\tilde{{\textbf {h}}}}_{{\text{HR}}}}}\) denotes the random fastfading nonlineofsight (NLoS) component that are independently and identically distributed. \(K_1\) is the Rician factor denoting the power ratio between \({{{\bar{{\textbf {h}}}}_{{\text{HR}}}}}\) and \({{{\tilde{{\textbf {h}}}}_{{\text{HR}}}}}\). Based on the corresponding antenna array response, \({{{\bar{{\textbf {h}}}}_{{\text{HR}}}}}\) can be written as
where \(d_r\) denotes the spacing of each reflective element and \(\cos \phi = \frac{{x\left[ n \right]  {x_H}}}{{\left\ {{{\textbf {q}}}\left[ n \right]  {{{\textbf {L}}}_H}} \right\ }}\) denotes the cosine of the angle of arrival (AoA) from the HAP to the RIS [27]. And, the channel vector gain from the RIS to the kth UEs can be written as
where \({\beta _{{\text{RG}}}}\) denotes the associated path loss index and \({d_{{\textrm{RG}},k}} = \sqrt{{{\left\ {{{\textbf {q}}}\left[ n \right]  {{{\textbf {b}}}_k}} \right\ }^2}}\) represents the distance from the RIS to the kth UEs [28]. Then, \({{{\bar{{\textbf {g}}}}_{{\textrm{RG}},k}}}\) can be computed as
where \(\cos {\phi _{{\textrm{RG}},k}} = \frac{{x\left[ n \right]  {x_k}}}{{\left\ {{{\textbf {q}}}\left[ n \right]  {{{\textbf {b}}}_k}} \right\ }}\) denotes the cosine of the angle of departure (AoD) from the RIS to the kth IoT device.
2.3 Energy consumption model
In the designed system model, the energy consumption model is mainly considered for UAVs because the communicationrelated energy consumption is significantly lower than the propulsion energy of UAVs. From [29], the flight energy of UAV can be written as
According to [30, 31], the blade profile power consumption \({P_B}\) and induced power consumption \({P_C}\) can be expressed respectively as
where \({P_h}\) indicates the consumption of the leaf type in the hovering state, \({P_S}\) is the consumption at the induction state, \({P_D}\) denotes the parasitic power consumption, while Table 2 provides a summary of the physical interpretations of the parameters referred to in Eqs. (21)–(23). The overall energy consumption model can be computed as
where \(t_{{\text{fly}}}\) denotes the entire operating flight time of the UAV.
2.4 Problem formulation
The aim of this paper is to jointly optimize the active beamforming matrix \({\textbf{w}}\) of the HAP, the passive beamforming reflection matrix \(\mathbf \Phi\) of the RIS, and the trajectory points \({\textbf{Q}}\) of the UAV to maximize the SER. Considering the signal transmission stage and the RIS phase modulation constraints, the problem of maximizing the longterm system efficiency can be expressed as follows:
where C1 represents the transmit power constraint at the HAP; C2 and C3 correspond to the requirement for the UAV to complete its mission within a fixed area; C4 guarantees that the UAV will have only one path into and out of each hovering position, and C5 represents the phaseshift modulation constraints of the RIS elements [32].
The previous discussion highlights the problem as a constrained combinatorial optimization problem. As the number of UEs increases, the complexity of the optimization function also escalates, posing challenges for finding feasible solutions using traditional alternating optimization algorithms. To tackle this issue, a new approach is introduced in the paper. It utilizes the long shortterm memory (LSTM) architecture within the framework of the double deep Qnetwork (DDQN) algorithm to obtain tractable solutions for these complex optimization problems. By incorporating LSTM with DDQN, the proposed method aims to address the growing complexity and provide efficient solutions to the optimization problem.
3 Joint optimization algorithm approach
In this section, the methodologies employed to address the joint optimization problems is represent. Firstly, we provide an introduction to the fundamental principles of LSTM neural network computation. This serves as a foundation for guiding the solution steps in our study. Next, we propose a novel approach based on the LSTMDDQN framework. This approach leverages the information retention capability of LSTM networks to enhance the decisionmaking process of the DDQN algorithm. By integrating LSTM with DDQN, we aim to improve the effectiveness of the optimization process, enabling more accurate decisionmaking for joint optimization problems.
3.1 Preliminaries of LSTM
The LSTM network is a type of recurrent neural network known for its ability to retain and retrieve information from both shortterm memory and longterm memory. This characteristic makes it highly suitable for tasks involving sequential data processing. Central to the LSTM architecture is the LSTM cell, which comprises four gates. These gates are governed by sigmoid activation functions and regulate the flow of information into the cell. The operations performed by these gates can be mathematically expressed as
where \(f^{\left( t \right) }\) represents the forget gate responsible for determining which information in the cell state to retain or discard. The input gate, denoted by \(i^{\left( t \right) }\), decides which new information should be incorporated into the cell state based on the current input and previous hidden state. The output gate, \(o^{\left( t \right) }\), controls the extent to which the information in the cell state is utilized to generate the current output. The weights associated with each gate are represented by \({{W_f}}\), \({{W_i}}\), and \({{W_o}}\) while \({{b_f}}\), \({{b_i}}\), and \({{b_o}}\) are the corresponding bias terms. The cell state and hidden state at the previous time step are denoted as \({C^{\left( t1 \right) }}\) and \({h^{\left( t1 \right) }}\). The activation functions \(\beta \left( \cdot \right)\) and \(\tanh \left( \cdot \right)\) are applied to certain intermediate results.
While the LSTM gates play a crucial role in information flow regulation, the LSTM cell also includes control units that enhance its adaptability and flexibility for various tasks. These control units enable the LSTM network to adjust the flow of information based on the specific requirements of the task at hand. This capability makes LSTM networks particularly useful for processing and forecasting time series data, including significant events. After passing through the gates and undergoing correlation operations, the input information is modified, resulting in the updated unit state \({C^{\left( t \right) }}\) and hidden state \({h^{\left( t \right) }}\). These states are continuously updated by the gating units, which selectively control the flow of information into and out of the LSTM cell. By incorporating an LSTM network, a UAV (unmanned aerial vehicle) can learn from its previous experiences and make informed decisions based on the accumulated knowledge. In our approach, we aim to incorporate the LSTM architecture into the construction of deep convolutional neural network models to enable longterm memory of the explored environmental state information. Prior to integrating the LSTM architecture, the network model can be represented as follows:
where \(\text {EncoderProcess}\left( \cdot \right)\) denotes the encoding function. On the other hand, the decoding process is performed by multilayer fully connected network denoted as \(\text {DecoderProcess}\left( \cdot \right)\). Besides, we use \(\left( {{x^{\left( t \right) }}} \right)\) to represent the input observation data, and \({X^{\left( t \right) }}\) is an implicit representation of the observed data encoded using a neural network. This formulation allows us to capture and utilize the underlying patterns and features of the observed data in an efficient manner. By incorporating both the encoding and decoding processes, our model can effectively learn and generate meaningful outputs based on the input observations.
3.2 Algorithm process
This article proposes an improved DDQN algorithm that integrates the LSTM and DDQN architectures. The network structure is shown in Fig. 2. The state historical information and channel information obtained by the UAV during exploration are used as inputs. The state features are extracted through three convolutional layers and then passed to an LSTMDNN layer, which is used for longterm memory and storage of the explored environmental states. The LSTMDNN layer is followed by a second fully connected layer, which outputs Qvalues as a basis for choosing among the possible actions.
With the introduction of an LSTM layer into the neural network structure, useful historical state information can be stored in longterm memory, allowing the agent to make informed decisions based on past experiences when exploring unknown environments. With this neural network, the newly transformed state space \(s_n^{\left( t \right) }\) can be written as a function of the previous state and the current observation. The agent gradually improves its policy by adapting its behavior based on the feedback received from the environment. By leveraging the MDP framework, the DRL approach allows the agent to learn from its interactions with the environment over time and maximize its cumulative reward [33].
1) State space: In deep reinforcement learning (DRL), the state space refers to the collection of all possible states that describe the environment. A state represents the environment at a specific moment and encompasses all pertinent information required for an intelligent agent to make decisions based on the current circumstances. In this paper, we focus on a discrete state space, where the set of possible states is finite, allowing the intelligent agent to choose from a limited number of discrete states. The design of the state space plays a crucial role in the success of reinforcement learning. It should effectively capture the essential characteristics of the environment, encompassing relevant input and target variables. This enables the intelligent agent to accurately perceive and adapt to the dynamic changes within the environment, optimizing the effectiveness of its strategies. In this paper, the state space consists of the current location of the UAV, the channel characteristics, the energy consumed by the UAV, and the action taken in the \((t1)\)th time step.
2) Action space: In DRLbased optimization framework, the action space represents the set of all possible actions that an intelligent agent can take. These actions allow the agent to interact with the environment and influence its state by selecting different actions. In each time step, the agent is required to select an action from the available action space to execute and advance toward its optimization goal. This action selection process enables the agent to gradually move closer to achieving its desired outcome. Through the utilization of techniques like deep neural networks, the agent can learn the mapping between states and actions, enabling it to make optimal decisions. The design of the action space is intricately connected to the agent’s strategy and decisionmaking process, significantly impacting the achievement of reinforcement learning objectives. The action space consists of three parts, the change in the phase shift of each reflective element, the change in the values of the transmit beamforming matrix, and the change in the UAV’s position. By considering the existing conditions, the agent makes informed decisions to select the most appropriate action that aligns with its goals and objectives.
3) Reward: The reward function refers to the spectrum of feedback signals received by an intelligent agent based on its actions. Rewards serve as a measure to evaluate the quality of the agent’s behavior and act as a feedback mechanism to indicate whether it is progressing toward its optimization goal. They can be represented as real numbers, reflecting the desirability of the agent’s actions. Positive rewards typically signify favorable feedback, encouraging the agent to reinforce and increase such behavior. Conversely, negative rewards indicate unfavorable feedback, prompting the agent to avoid or diminish those actions. Zero rewards may represent neutral feedback without significant influence on the agent decisionmaking process. In this paper, we consider reward functions with respect to the SER [34], \(R\left( t \right)\), and includes the transmit beamforming matrix, the RIS phase shifts, the UAV position, and the direction of motion selection for UAV, denoted as
where the environment provides feedback to the agent in the form of a reward/penalty function. In this paper, we propose a reinforcement learning framework to enhance the overall energy efficiency (EE) of the system by incentivizing the agent to select actions that improve performance. When the agent chooses actions that lead to improved EE, it receives positive rewards from the environment. This reinforcement mechanism reinforces the agent’s behavior, encouraging it to repeat similar actions in the future. To address the optimization problem using the LSTMDDQN algorithm, we follow several steps. Firstly, we initialize the hyperparameters of the LSTM network and the environmental parameters in the communication scenario. The initial coordinates of the UAV and RIS phase shifts are used to calculate the channel coefficients and determine the user rate of the current state. Next, we feed the state information into the LSTM network and employ selection operations to obtain the next state. To further improve the algorithm’s performance and efficiency in solving optimization problems that involve transmit beamforming, RIS phase shift, and UAV trajectory design, we integrate the prioritized experience replay strategy with the LSTM architecture and DDQN algorithm.
4 Simulation experiment and result analysis
In this section, we present the performance evaluation of our proposed system with an advanced DRLbased LSTMDDQN framework and analyze the transmission model under various parameter settings to assess its effectiveness. The main system model other parameters used for simulation are provided in Table 3. To compare the performance of our proposed algorithm with other simulation methods, we have designed four benchmark scenarios that demonstrate its superiority in the considered system model. Specifically, these benchmark scenarios are carefully selected to reflect different system setups, including varying numbers of users and RIS elements, different channel conditions, and diverse transmission distances, which can be expressed as
Benchmark 1 RISNOMA LSTMDDQN scheme: We design the proposed LSTMDDQN algorithm with prioritized experience replay. This modification helps us tackle the problem more effectively by optimizing the values of \({\textbf{Q}}\), \(\mathbf {\Phi }\), and \({\textbf{w}}\) while ensuring that the capacity constraint is satisfied.
Benchmark 2 RISNOMA DDQN scheme: Unlike Benchmark 1, our proposed approach deviates from relying on historical information and instead promotes the use of a traditional DDQN scheme. In this scheme, the agent decisionmaking process is independent of any influence from past experiences or historical data. By adopting a traditional DDQN scheme, our proposal aims to simplify the decisionmaking process and reduce computational complexity. This approach allows the agent to focus solely on the current state of the system and make decisions based on immediate information.
Benchmark 3 RISOMA LSTMDDQN scheme: In contrast to other benchmarks, our proposal recommends the utilization of the proposed LSTMDDQN scheme, which incorporates the OMAenabled transmission strategy.
Benchmark 4 RISOMA DDQN scheme: In contrast to Benchmark 3, our proposal suggests the utilization of both the traditional DDQN scheme and the OMAenabled transmission scheme. This approach combines the strengths of the two methodologies to show the enhanced system performance.
Figure 3 shows the relationship between the reward function and the number of reflection elements during the traversal process of the optimization schemes using different algorithms, where the meaning of the reward function can be interpreted as the energy efficiency, which is closely related to the energy consumption during the whole communication process. In this scenario, a UAV is launched from a fixed position, and an algorithm is employed to determine the optimal position at each operational step. Throughout this process, the UAV dynamically adjusts its active transmit beamforming matrix and the phase shift of the RIS. As depicted in Fig. 3, the number of convergence steps for the LSTMDDQNbased algorithm closely matches that of the conventional DDQN algorithm. It only takes approximately 320 steps, validating the effectiveness of the state space and rewards designed in this study, as well as the proposed algorithm’s performance. Moreover, it highlights the potential of integrating LSTM into the DDQN framework, demonstrating promising outcomes.
From Fig. 4, we can conclude that with the increase of RIS reflection elements, the energy efficiency after optimization using the LSTMDDQN algorithm gets significantly increased. At M=16, the energy efficiency after optimization using LSTMDDQN is increased by 7% compared to the traditional DDQN algorithm and at \(M = 128\), the energy efficiency after optimization using LSTMDDQN is increased by 23% compared to the traditional DDQN algorithm. By incorporating LSTM networks into the DDQN framework, temporal correlations can be effectively captured [35]. This is achieved by utilizing LSTM’s ability to learn dynamic patterns of sequence data through the inclusion of outdated but useful information as input sequences. Moreover, the LSTMbased DDQN framework benefits from memory units and gating mechanisms, which enable it to handle longterm dependencies present in the sequence data. However, when transitioning from \(M = 16\) to \(M = 32\) RIS elements, the performance gain resulting from the increased number of components is not significantly apparent. The substantial performance enhancement observed with a higher number of RIS elements can be attributed to several factors. Firstly, RIS can effectively enhance the signal strength and quality by reflecting and manipulating the incident signals. By adjusting the phase and amplitude of the reflected signals, RIS can optimize the channel conditions and mitigate the effects of fading and interference. Secondly, RIS enables precise beamforming and waveform shaping. This increases the signal power in specific areas or toward specific users, resulting in improved coverage, reduced interference, and enhanced link quality. Furthermore, RIS can enable enhanced spatial multiplexing and diversity gain. By exploiting the reflective properties of RIS, it becomes possible to create multiple signal paths between the transmitter and the receiver. Additionally, RIS can enhance diversity gain by introducing additional signal paths that mitigate the effects of fading and improve signal reliability.
5 Conclusions
In this paper, we introduced a novel architecture for downlink massive access RISUAV relayassisted in hybrid FSO/RFbased ISATRNs. A secure signal transmission model was first established by defining a target optimization problem based on different transmission modes at various stages. Secondly, we leveraged the performance of DRL technology for its modelfree nature. Then, we further decomposed the optimization problem into subproblems, including trajectory optimization for the UAV, active beamforming matrix, and RIS phase shift. To address these subproblems under constraints, we proposed a novel DRLbased LSTMDDQN algorithm framework to supplement the current state with historical information. The proposed LSTMDDQN algorithm had strong scalability to a certain extent with prioritized experience replay due to high dimensional state space and partial observability. Finally, numerical simulation results demonstrated the superiority of the LSTMDDQN algorithm and verified the impact of the number of RIS reflection elements, and the transmit power level on SER.
Availability of data and materials
The raw/processed data required to reproduce the above findings cannot be shared at this time as the data also forms part of an ongoing study.
References
N. Saeed, H. Almorad, H. Dahrouj, T.Y. AlNaffouri, J.S. Shamma, M.S. Alouini, Pointtopoint communication in integrated satelliteaerial 6G networks: stateoftheart and future challenges. IEEE Open J. Commun. Soc. 2(2), 1505–1525 (2021)
X. Zhu, C. Jiang, Integrated satellite–terrestrial networks toward 6G: architectures, applications, and challenges. IEEE Internet Things J. 9(1), 437–461 (2022)
K. An, M. Lin, J. Ouyang, W.P. Zhu, Secure transmission in cognitive satellite terrestrial networks. IEEE J. Sel. Areas Commun. 34(11), 3025–3037 (2016)
Z. Lin, M. Lin, T. de Cola, J.B. Wang, W.P. Zhu, J. Cheng, Supporting IoT with ratesplitting multiple access in satellite and aerialintegrated networks. IEEE Internet Things J. 8(14), 11123–11134 (2021)
F. Zhou, X. Li, M. Alazab, R.H. Jhaveri, K. Guo, Secrecy performance for RISbased integrated satellite vehicle networks with a UAV relay and MRC eavesdropping. IEEE Trans. Intell. Veh. 8(2), 1676–1685 (2023)
X. Zhang, D. Guo, K. An, G. Zheng, S. Chatzinotas, B. Zhang, Auctionbased multichannel cooperative spectrum sharing in hybrid satellite–terrestrial IoT networks. IEEE Internet Things J. 8(8), 7009–7023 (2021)
M.A. Khalighi, M. Uysal, Survey on free space optical communication: a communication theory perspective. IEEE Commun. Surv. Tutor. 16(4), 2231–2258 (2014)
G. Xu, N. Zhang, M. Xu, Z. Xu, Q. Zhang, Z. Song, Outage probability and average BER of UAVassisted dualhop FSO communication with amplifyandforward relaying. IEEE Trans. Veh. Technol. 72(7), 8287–8302 (2023)
G. Pan, J. Ye, J. An, M.S. Alouini, Fullduplex enabled intelligent reflecting surface systems: opportunities and challenges. IEEE Wirel. Commun. 28(3), 122–129 (2021)
L. Lv, Q. Wu, Z. Li, Z. Ding, N. AlDhahir, J. Chen, Covert communication in intelligent reflecting surfaceassisted NOMA systems: design, analysis, and optimization. IEEE Trans. Wirel. Commun. 21(3), 1735–1750 (2022)
T. Wang, F. Fang, Z. Ding, An SCA and relaxation based energy efficiency optimization for multiuser RISassisted NOMA networks. IEEE Trans. Veh. Technol. 71(6), 6843–6847 (2022)
Z. Lin, M. Lin, J.B. Wang, T. de Cola, J. Wang, Joint beamforming and power allocation for satellite–terrestrial integrated networks with nonorthogonal multiple access. IEEE J. Sel. Top. Signal Process. 13(3), 657–670 (2019)
Z. Na, Y. Liu, J. Shi, C. Liu, Z. Gao, UAVsupported clustered NOMA for 6Genabled Internet of Things: trajectory planning and resource allocation. IEEE Internet Things J. 8(20), 15041–15048 (2021)
S. Yang, J. Zhang, W. Xia, Y. Ren, H. Yin, H. Zhu, A unified framework for distributed RISaided downlink systems between MIMONOMA and MIMOSDMA. IEEE Trans. Commun. 70(9), 6310–6324 (2022)
X. Pang et al., When UAV meets IRS: expanding airground networks via passive reflection. IEEE Wirel. Commun. 28(5), 164–170 (2021)
L. Lv, Z. Ding, J. Chen, N. AlDhahir, Design of secure NOMA against fullduplex proactive eavesdropping. IEEE Wirel. Commun. Lett. 8(4), 1090–1094 (2019)
K. Guo, M. Wu, X. Li, H. Song, N. Kumar, Deep reinforcement learning and NOMAbased multiobjective RISassisted ISUAVTNs: trajectory optimization and beamforming design. IEEE Trans. Intell. Transp. Syst. 24(9), 10197–10210 (2023)
K. Li, C. Huang, Y. Gong, G. Chen, Double deep learning for joint phaseshift and beamforming based on cascaded channels in RISassisted MIMO networks. IEEE Wirel. Commun. Lett. 12(4), 659–663 (2023)
P. Chen, W. Huang, X. Li, S. Jin, Deep reinforcement learning based power minimization for RISassisted MISOOFDM systems. China Commun. 20(4), 259–269 (2023)
Z. Peng, Z. Zhang, L. Kong, C. Pan, L. Li, J. Wang, Deep reinforcement learning for RISaided multiuser fullduplex secure communications with hardware impairments. IEEE Internet Things J. 9(21), 21121–21135 (2023)
X. Li et al., Physicallayer authentication for ambient backscatteraided NOMA symbiotic systems. IEEE Trans. Commun. 71(4), 2288–2303 (2023)
H. Niu, Z. Chu, F. Zhou, P. Xiao, N. AlDhahir, Weighted sum rate optimization for STARRISassisted MIMO system. IEEE Trans. Veh. Technol. 71(2), 2122–2127 (2022)
K. Guo et al., Physical layer security for multiuser satellite communication systems with thresholdbased scheduling scheme. IEEE Trans. Veh. Technol. 69(5), 5129–5141 (2020)
C. Huang, A. Zappone, G.C. Alexandropoulos, M. Debbah, C. Yuen, Reconfigurable intelligent surfaces for energy efficiency in wireless communication. IEEE Trans. Wirel. Commun. 18(8), 4157–4170 (2019)
C. Gong, X. Yue, X. Wang, X. Dai, R. Zou, M. Essaaidi, Intelligent reflecting surface aided secure communications for NOMA networks. IEEE Trans. Veh. Technol. 71(3), 2761–2773 (2022)
M.A. AIHabash, Mathematical model for the irradiance probability density function of a laser beam propagating through turbulent media. Opt. Eng. 40(8), 1554–1562 (2001)
J. Gao, Y. Wu, S. Shao, W. Yang, H.V. Poor, Energy efficiency of massive random access in MIMO quasistatic Rayleigh fading channels with finite blocklength. IEEE Trans. Inf. Theory 69(3), 1618–1657 (2023)
Z. Jia, M. Sheng, J. Li, D. Niyato, Z. Han, LEOsatelliteassisted UAV: joint trajectory and data collection for internet of remote things in 6G aerial access networks. IEEE Internet Things J. 8(12), 9814–9826 (2021)
H.T. Ye, X. Kang, J. Joung, Y.C. Liang, Optimization for fullduplex rotarywing UAV enabled wirelesspowered IoT networks. IEEE Trans. Wirel. Commun. 19(7), 5057–5072 (2020)
Y. Zeng, J. Xu, R. Zhang, Energy minimization for wireless communication with rotarywing UAV. IEEE Trans. Wirel. Commun. 18(4), 2329–2345 (2013)
H. Zhang, M. Huang, H. Zhou, X. Wang, N. Wang, K. Long, Capacity maximization in RISUAV networks: a DDQNbased trajectory and phase shift optimization approach. IEEE Trans. Wirel. Commun. 22(4), 2583–2591 (2023)
X. Pang, N. Zhao, J. Tang, C. Wu, D. Niyato, K.K. Wong, IRSassisted secure UAV transmission via joint trajectory and beamforming design. IEEE Trans. Commun. 70(2), 1140–1152 (2022)
N. Zhao, Z. Ye, Y. Pei, Y.C. Liang, D. Niyato, Multiagent deep reinforcement learning for task offloading in UAVassisted mobile edge computing. IEEE Trans. Wirel. Commun. 21(9), 6949–6960 (2022)
X. Liu, Y. Liu, Y. Chen, H.V. Poor, RIS enhanced massive nonorthogonal multiple access networks: deployment and passive beamforming design. IEEE J. Sel. Areas Commun. 39(4), 1057–1071 (2021)
J. Xu, B. Ai, Deep reinforcement learning for handoveraware MPTCP congestion control in spaceground integrated network of railways. IEEE Wirel. Commun. 28(6), 200–207 (2021)
Acknowledgements
The authors would like to extend their gratitude to the anonymous reviewers for their valuable and constructive comments, which have largely improved and clarified this paper.
Funding
This work was supported by the Soft Science Research Project of Henan Province under Grant 222400410137, and in part by the Key R &D projects in the autonomous region under Grant 2020B020182, and the Natural Science Foundation of Universities of Anhui Province under Grant KJ2020A0694.
Author information
Authors and Affiliations
Contributions
JL, HX, MW, FW, TG, and FZ conceived of and designed the experiments. JL and HX performed the experiments. JL, MW, and FW analyzed the data. TG contributed analysis tools; JL, HX, MW, and FZ wrote the paper. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, J., Xue, H., Wu, M. et al. Energy efficiency performance in RISbased integrated satellite–aerial–terrestrial relay networks with deep reinforcement learning. EURASIP J. Adv. Signal Process. 2023, 121 (2023). https://doi.org/10.1186/s13634023010707
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634023010707