A DRL-based resource allocation for IRS-enhanced semantic spectrum sharing networks

Semantic communication and spectrum sharing are pivotal technologies in addressing the perennial challenge of scarce spectrum resources for the sixth-generation (6G) communication networks. Notably, scant attention has been devoted to investigating semantic resource allocation within spectrum sharing semantic communication networks, thereby constraining the full exploitation of spectrum efficiency. To mitigate interference issues between primary users and secondary users while augmenting legitimate signal strength, the introduction of Intelligent Reflective Surfaces (IRS) emerges as a salient solution. In this study, we delve into the intricacies of resource allocation for IRS-enhanced semantic spectrum sharing networks. Our focal point is the maximization of semantic spectral efficiency (S-SE) for the secondary semantic network while upholding the minimum quality of service standards for the primary semantic network. This entails the joint optimization of parameters such as semantic symbol allocation, subchannel allocation, reflective coefficients of IRS elements, and beamforming adjustment of secondary base station. Recognizing computational intricacies and interdependence of variables in the non-convex optimization problem formulated, we present a judicious approach: a hybrid intelligent resource allocation approach leveraging dueling double-deep Q networks coupled with the twin-delayed deep deterministic policy. Simulation results unequivocally affirm the efficacy of our proposed resource allocation approach, showcasing its superior performance relative to baseline schemes. Our approach markedly enhances the S-SE of the secondary network, thereby establishing its prowess in advancing the frontiers of semantic spectrum sharing (S-SE).


Introduction
The problem of spectrum scarcity is further exacerbated by the massive smart devices and massive connections that characterize 6 G wireless communication networks [1][2][3].However, it is difficult for the existing conventional communication paradigms to further improve the spectral efficiency, as revealed in the Shannon rate limit [4][5][6].Recently, the artificial intelligence (AI)-driven semantic communication paradigm has received much attention due to the great promise it shows in breaking the Shannon limit and improving spectral efficiency [7].Specifically, semantic communication is a communication paradigm over the semantics, where the parties communicate the intention.In task-oriented semantic communications, the task-related necessary information is transmitted, while the unnecessary information, e.t. the task-unrelated information, is ignored [8].Semantic encoding networks are able to utilize the powerful knowledge representation capabilities of machine learning to extract and encode key task-relevant information, greatly avoiding information redundancy [9,10].Under the semantic communication paradigm, joint source channel coding techniques have been developed and demonstrate better performance than separated coding [11].
Spectrum sharing technology has been widely used due to its high spectrum utilization efficiency [12].Specifically, the secondary network is able to share the spectrum of the primary network while minimizing the impact on the primary network [13].However, under the same active channel, the interference between primary and secondary users (SUs) is unavoidable during the sharing period.This may have an impact on the main network.Recently, spectrum sharing networks assisted by smart reflective surfaces have been widely studied [14].Intelligent Reflective Surfaces (IRS), a future-proof technology, can enhance target signal strength and attenuate interfering signals in a lowenergy way [15,16].Specifically, a smart reflective surface consists of a planar array of passive transmitting elements, which can be programmed to adjust the reflection factor of the reflective elements to achieve signal-phase adjustment during the signal reflection process [17].
Due to the superior ability of solving large-scale complex problems and real-time performance [18], deep reinforcement learning (DRL) has recently attracted a lot of attention.This trend of interest reflects the growing recognition of the superior effectiveness of DRL in dealing with complex situations where traditional methods may not perform well.It is likely that the citation refers to an in-depth discussion of specific advances or applications within the field, contributing to a more comprehensive understanding of the importance and impact of DRL in contemporary problem solving paradigms.The advantage of DRL over traditional convex optimization methods is its ability to handle highdimensional, nonlinear, and complex problems, and to adapt to diverse environments and tasks by extracting features and optimization strategies from experience through learning [19].However, the traditional convex optimization methods usually face problems with low dimensionality and linear structure, which make it difficult to effectively deal with complex real-world scenarios.Deep reinforcement learning is able to learn autonomously and gradually improve its performance through end-to-end learning of neural networks and thus has more advantages when facing variable and uncertain problems in the real world [20].Hence, it is promising to propose an DRL-based AI-native resource allocation scheme for semantic spectrum sharing networks.

Semantic coding network
A great number of works have paid attention to semantic communication networks.According to the data source modality, the type of semantic coding research can be divided into text [21,22], image [23,24], and speech [25].The authors in [21] exploited inference rule from the knowledge graph, which aimed to obtain the inexplicable and inflexible of the semantic communication networks.The authors in [24] investigated the unmanned aerial vehicle (UAV) image-sensing-driven semantic communication for a triple-based scene construction.In [25], the authors considered a speech semantic communication network, where speech synthesis at the receiver entails a dedicated process wherein the regeneration of speech signals transpires.This involves inputting the recognized text and speaker information into a neural network module for the purpose of generating the synthesized speech signals.There is little work [26][27][28] that pays attention to the resource allocation for semantic communication networks.In [26], the authors defined the semantic spectral efficiency (S-SE) for the first time.Then, based on the [26], the authors in [27] further considered the quality of experience of the users.The authors in [28] proposed a novel semantic-bit quantization method and considered an adaptive resource allocation scheme for semantic communications over the physical wireless channels.In [29], the IRS-assisted secure semantic communication network was investigated, while the spectrum sharing technology was not considered for further improvement of spectrum efficiency.Moreover, IRS was used to counter semantic eavesdropping in [29], while IRS has not been used to eliminate inter-user interference and ensure the quality of service of the primary network.As the best knowledge of the authors, there has been little work considering the IRS-assisted semantic communication networks with spectrum sharing.

IRS-assisted spectrum sharing networks
Given that IRS employs passive reflective elements devoid of signaling methods on received signals, the IRS demonstrates a capacity to reshape signals with minimal overhead [30].Notably, a substantial body of research has concentrated on unraveling the extensive potential inherent in IRS technology [31][32][33][34][35][36].The authors in [32] considered the IRS-assisted secure spectrum sharing network, where the IRS can significantly enhance the legal signal and suppress the eavesdropping at the eavesdroppers.The authors in [34] introduced the multiple IRS for wide convergence and the secure performance improvement of the secondary network.Nevertheless, it is noteworthy that only a limited number of studies have delved into the augmentation of semantic spectrum efficiency (S-SE) for task performance in the context of semantic spectrum sharing networks operating under low signal-to-noise ratios.Furthermore, a majority of the previously mentioned investigations [31,[33][34][35][36] have predominantly relied on conventional methodologies grounded in convex optimization, a less time-efficient paradigm when confronted with the complexities of large-scale connections.Consequently, there exists a compelling imperative to delve into and develop time-efficient intelligent resource allocation schemes [37].

Intelligent resource allocation approach
DRL-based resource allocation approaches have been widely used due to its powerful computational ability [38][39][40][41][42][43].The authors in [39] proposed a DRL-based intelligent resource allocation scheme, which can rapidly solve the tricky non-convex problem.The authors in [40] proposed a hybrid intelligent resource allocation to address the formulated optimization problem for CR networks.In the work [41], the authors intricately devised DRL-based schemes with the explicit goal of diminishing output dimensionality, elevating learning efficiency, and formulating a judicious resource allocation policy.Moreover, the DRL-based resource allocation scheme was used for mobile edge computing in railway Internet of Things (RIoT) networks to jointly optimize the subcarrier assignment, offloading ratio, power allocation, and computation resource allocation.Note that the computational complexity comparison conducted in the [34], the DRL-based scheme performed better time efficiency and achieved the close performance compared to the traditional mathematical methods.Further considering the semantic resource allocation in the UAVassisted semantic communication system, DRL-based resource allocation scheme and intelligent trajectory planning scheme were proposed in [43].

Motivations and contributions
We investigate IRS-assisted semantic spectrum sharing networks in this paper to optimize the S-SE of the secondary network.The noteworthy contributions of this paper are as follows.
• In this paper, we explore for the first time IRS-assisted semantic spectrum sharing communication networks.Specifically, IRS is able to simultaneously enhance the performance of semantic tasks in the secondary network while minimizing the interference of the secondary network to the primary network.Semantic spectrum efficiency is utilized to evaluate the secondary network spectrum efficiency.The S-SE is maximized by jointly optimizing the allocation of subchannel, semantic symbols, IRS reflection array elements, and the beamforming of secondary base station.• To solve our proposed complex non-convex problem, based on dueling doubledeep Q networks (D3QN) twin-delayed deep deterministic policy (TD3), an intelligent resource allocation scheme is introduced for semantic spectrum sharing networks.Specifically, the discrete action space, i.e., semantic symbol number allocation and subchannel allocation, is handled by utilizing D3QN.TD3, on the other hand, can effectively address continuous action spaces, i.e., tuning the transmit beam and optimizing the IRS reflection elements.Such a hybrid algorithm design can fully utilize the powerful Q-value computation capability of D3QN and the powerful exploration capability of TD3 in high-dimensional space.• Simulation results demonstrate that our proposed IRS-assisted semantic spectrum sharing network can significantly enhance the S-SE of the secondary network while guaranteeing the communication quality of service in the primary network compared to the benchmark scheme lacking IRS and the conventional communication scheme.In addition, we demonstrate that our proposed hybrid intelligent resource allocation scheme can reach convergence in a short period of time, proving its powerful exploration capability.
The rest of the paper is organized as follows: Section II presents the IRS-assisted semantic spectrum sharing network model.Section III presents the problem of maximizing the semantic spectral efficiency of the sub-network.Section IV presents the hybrid resource allocation scheme.Section V demonstrates the simulation results.Finally, Section VI gives the conclusion of this paper.
2 System model and optimization problem

Semantic coding framework
Here, we consider an IRS-enhanced semantic spectrum sharing network designed for the text recovery task, featuring a transmitter and a receiver and a IRS (Fig. 1).The transmitter incorporates a semantic encoder and channel encoder for extracting features and semantically encoding text sources, while the receiver includes the channel decoder and semantic decoder to decode semantics and recover mission-critical information.Let S = [s 1 , . . ., s i , . . ., s l ] represent a sentence, where l is the sentence length, s i is the i-th word.We employ DeepSCs with multi-level semantic symbol outputs for higher trans-  The received signal, denoted as y ∈ C N r ×1 , is succinctly expressed by Here, H ∈ C N r ×N l denotes the channel matrix, H ∈ C N r ×N l denotes the IRS reflective coefficients, and n ∈ C N r ×1 ∼ CN (0, σ 2 I) is additive white Gaussian noise (AWGN), where σ 2 denotes noise variance and I denotes identity matrix.

IRS-enhanced semantic spectrum sharing communication
We introduce an IRS, which is equipped with E reflective elements, strategically positions itself to improve the transmission efficiency of the secondary network.Let = diag ̟ 1 e jφ 1 , ̟ 2 e jφ 2 , • • • , ̟ n e jφ e ∈ C E×E denote the IRS phase coefficients, where ̟ e ∈ [0, 1] signifies IRS amplitude coefficients, while φ e ∈ [0, 2π ] symbolizes IRS phase coefficients.Let D = {1, . . ., D} be primary users (PUs), and K = {1, . . ., K } be SUs.Let h p,d ∈ C M×1 , G p,r ∈ C E×M and g d ∈ C E×1 be the channel coefficients from base station (BS) to PU d, from the primary base station (PBS) to IRS and from IRS to PU d, respectively.Similarly, let h s,k ∈ C M×1 , G s,r ∈ C E×M , and g r,k ∈ C E×1 be the chan- nel coefficients from BS to SU k, from SBS to IRS, and from IRS to SU k, respectively.Let C = {1, . . ., C} be subchannel set.The channel allocation of k-th SU is expressed as Let f (1) y = Hx + Hx + n. (2) (

Semantic similarity metrics
Similar to the semantic similarity metric proposed in [22], the Bidirectional Encoder Representations from Transformers (BERT) model is used.BERT is a cutting-edge natural language processing model characterized by its bidirectional attention mechanism, enabling it to capture contextual information in a given sequence.This innovative architecture, based on the transformer model, facilitates a deeper understanding of word relationships and semantic nuances within sentences, leading to superior performance in a wide array of language understanding tasks such as sentiment analysis, named entity recognition, and question answering.BERT's pre-training strategy involves training on large corpora, empowering the model to grasp intricate linguistic patterns and foster transfer learning for downstream applications, making it a pivotal milestone in the field of NLP.The BERT-based semantic similarity can be obtained by where B(•) is the pretrained BERT model [44] and ξ ∈ [0, 1].

Semantic Spectral Efficiency
Following the principles outlined in [26], the semantic unit (sut) is introduced for semantic information representation.The unit of the rate of semantic transmission (S-R) is sut/s .and the unit of the S-SE is suts/s/Hz .We introduce a text dataset D = {s d } , where s d is the d-th sentence.We considered a prior known probability of each sentence d used, expressed by p(s d ) .Let P = D d=1 P d p(s d ) be the semantic information.Let L = D d=1 L d p(s d ) be the number of words per sentence.Considering a long-term semantic communication processing, P and L are fixed, which are randomly given in this paper.Let ν k be the semantic symbols used.Hence, the semantic symbols used for sen- tence representation denoted as Ŵ k = ν k L .Consistent with [26], the S-R is equivalent to channel bandwidth, where the S-R of k-th user over c-th subchannel is denoted by , where ξ k,c represents the semantic similarity difference.The ξ k,c is decided by symbol allocation, channel allocation, transmit beamforming of SBS, and the coefficients of the IRS reflective elements.Therefore, the achievable S-SE of k-th SU over c-th subchannel is expressed by

Problem formulation
The S-SE of the secondary network is maximized by semantic symbol allocation, subchannel allocation, beamforming of SBS, and the coefficients of the IRS reflective elements are jointly optimized.Hence, the problem can be formulated as follows (6) where ξ th d is the minimum task requirements of the PUs, and TP th s represents the upper bound of SBS transmit power.Constraint (8b) limits the subchannel assignment, where each subchannel can only be occupied by one SU.Constraints (8d) and (8e) limit the IRS reflective elements.Constraint (8f ) presents the upper bound of SBS transmit power.Constraint (8g) aims to maintain the requirements of QoS of the PUs.

Proposed DRL-based resource allocation scheme for semantic spectrum sharing networks
We present our design of a DRL-based resource allocation method for semantic spectrum sharing network.We propose a hybrid intelligent resource allocation method based on D3QN-TD3, which is able to deal with discrete action spaces, including subchannel assignment and semantic symbol assignment, and continuous action spaces, including IRS reflective elements and the transmit beamforming of the SBS, efficiently, as detailed below.

MDP formulation
The Markov Decision Process (MDP) serves as cornerstones of reinforcement learning theory, providing a structured framework for modeling decision-making problems.In the context of this paper, the optimization problem defined in ( 8) is initially transformed into an MDP problem, laying the groundwork for employing reinforcement learning algorithms to achieve optimal performance.Within this MDP framework, the environment is conceptualized as the IRS-enhanced semantic spectrum sharing networks, with intelligent agents residing in the control unit of the SBS.The definition of the state space, action space, reward function, and transition probability becomes pivotal in formulating the RL problem specific to our IRS-assisted semantic spectrum sharing communication network.This MDP-based approach enables the reinforcement learning algorithm to navigate the dynamic environment, making informed decisions to enhance the overall performance and effectiveness of the system.We designate S to denote the state space.At time step t, a state, represented as s t ∈ S , encapsulates a comprehensive set of information, including subchannel information, selected actions, S-SE ( t ), and obtained rewards.To be more explicit, the state s t at (8b) time step t is characterized by the S-SE denoted as t .This holistic representation ensures that the state captures the relevant aspects of the system's history, facilitating the reinforcement learning algorithm's ability to make informed decisions based on the accumulated knowledge of actions, subchannel dynamics, S-SE, and rewards.
The action space of the considered semantic spectrum sharing network is denoted as A .Specifically, at time step t, the semantic symbol allocation, subchannel allocation, the beamforming of SBS, and IRS reflective coefficients are, respectively, represented as ρ t , ζ t , F t and t .This multi-faceted action space reflects the diverse choices and configu- rations influencing the IRS-assisted semantic spectrum sharing network's behavior and performance at each time step.The action is expressed by The agent's action selection strategy is significantly influenced by the pivotal role played by the reward function.This function establishes the goal of maximizing S-SE and assesses the agent's performance following each iteration step.As a result, the reward function design holds paramount importance in determining the maximized S-SE.The loss function is designed as The policy serves as a crucial component in reinforcement learning, representing the probability with which the agent chooses a specific action a through current state s.It encapsulates the agent's strategy and decision-making process as it interacts with its surroundings.The overarching goal of the intelligent agent lies in gaining insights into an optimal resource allocation strategy.In pursuit of this objective, the long-term reward is intricately linked to the action selection process guided by the policy.This reward is a measure of the agent's success in achieving its goals and is essential for shaping the learning process.The formulation of an effective policy becomes pivotal, as it significantly influences the agent's ability to navigate and make informed decisions in the dynamic environment, ultimately contributing to the attainment of superior long-term performance, represented by In this context, the symbol γ ∈ [0, 1) represents a discount factor, a pivotal parameter governing the impact of past decision-making steps in the reinforcement learning process.The value of γ operates within the range [0, 1), where a larger γ indicates a further consideration of rewards.Conversely, a small γ suggests that the agent places greater emphasis on more recent decisions, tailoring its strategy to prioritize the most immediate and relevant information for optimal decision-making within the dynamic environment.The judicious selection of the discount factor is integral to shaping the agent's (9) temporal perspective and influencing its adaptability to different scenarios, thus playing a crucial role in achieving effective and context-aware learning.
Let π represent the policy.Therefore, the formulation of the optimal policy can be rep- resented by

Intelligent resource allocation approach based on D3QN-TD3
We consider a DRL-based resource allocation approach using D3QN-TD3 for semantic spectrum sharing communication networks.Our proposed scheme can be seen in Fig. 2.

D3QN algorithm for semantic symbol allocation and subchannel assignment
The incorporation of an additional advantage function into the DQN framework enhances the precision of Q-value estimations in Dueling DQN, requiring fewer discrete action data and thereby improving sample efficiency.Conversely, the concern of Q-value overestimation is tackled by double DQN through the prediction of Q-values using two sets of Q networks.D3QN, a synergistic integration of dueling DQN and double DQN, leverages the strengths of both algorithms.In this context, D3QN is employed for the DeepSC allocation and channel allocation.The Q-value can be calculated by where υ , τ and η are the parameters of hidden layers, action network, and value network, respectively.
Sample tuples (s t , a t , r t+1 , s t+1 ) with the size of N are randomly selected from the replay buffer.The target Q value in D3QN is expressed by where υ ′ represents the parameters of hidden layers in target networks.The loss func- tion is expressed by

TD3 algorithm for IRS reflective elements and transmit beamforming
The TD3 algorithm stands out in reinforcement learning by effectively addressing instability and sample inefficiency.Through the utilization of twin critics, TD3 enhances stability by minimizing overestimation bias in value function estimates, while the introduction of delayed policy updates prevents premature convergence to suboptimal policies.Additionally, the incorporation of target policy smoothing regularization promotes exploration and prevents the algorithm from becoming overly deterministic.These innovations collectively position TD3 as a robust solution, showcasing superior performance and potential applicability in diverse real-world scenarios, hence to address the continuous actions including IRS reflective elements and transmit beamforming of the SBS.
Let ϑ and ϑ − , respectively, be parameters of the actor network and target actor net- work.Let ǫ 1 and ǫ − 1 , respectively, be parameters of the critic networks, while ǫ 2 and ǫ − 2 represent the parameters of the target critic networks.The target Q value in TD3 can be obtained by The weights {ǫ i } of the critic networks are updated by

Training processing for semantic communication networks
To encapsulate our proposed algorithm succinctly, we utilize the current system state as a decisive input for the next action, encompassing semantic symbol allocation, channel ( allocation, transmit beamforming, and IRS reflective coefficients.This decision-making process is facilitated through the integration of D3QN and TD3 networks.Primarily, we introduce a semantic similarity estimation algorithm based on DeepSC [22] to gauge semantic similarity and subsequently compute S-SE.This involves a domain transfer of the network, tailoring it to the specifics of our considered semantic task domain.The experiential knowledge accumulated through each training step finds residence in replay buffer E .At predefined intervals I, the parameters of the evaluation network are syn- chronized with those of the target network.For clarity, the salient steps of our proposed approach, employing D3QN-TD3, are encapsulated in Algorithm 1.
Algorithm 1 D3QN-TD3-Based Hybrid Intelligent Resource Allocation Scheme for Semantic Spectrum Sharing Networks
This section is dedicated to evaluating and contrasting the performance of proposed approaches with benchmark schemes.Channel characteristics are meticulously modeled, incorporating Rician fading for channels from BSs to IRS and from IRS to users.Concurrently, Rayleigh fading is assumed for channels from the BSs to the users and from the BSs to the IRS.The path fading is quantified as PL = (PL 0 − 10τ log 10 (d/D 0 )) dB, with parameters set to PL 0 = 30 dB and D 0 = 1 m .The loss exponents τ bu , τ br , and τ ru governing channels from BSs to Users, BSs to IRS, and IRS to users, respectively, are established as 3.6, 2.0, and 2.1.The convert method in [26] that equates the SE of conventional communication to S-SE of semantic communications is introduced.This conversion is symbolized by � ′ n,m = R n,m I µL , where R n,m = C n,m /W represents the SE.The threshold th and ξ th d are, respectively, set to 0.2 suts/s/Hz and 0.9.Within the D3QN configuration, two Q networks and two target Q networks are utilized, each incorporating three hidden layers with 256 neurons per layer.The learning rate of D3QN is explicitly set to 0.003.In the TD3 setup, there exist one actor network, two critic networks, two target networks, and one target actor network.Each of these networks is configured with three hidden layers, comprising 512 neurons.The learning rate of TD3 is set to 0.003.The buffer size is U = 20, 000.
To comprehensively validate and compare the effectiveness of our proposed approach, we introduce several benchmark schemes as outlined below.The random IRS scheme can assess the impact of optimizing IRS reflective coefficients.In this setup, IRS reflection coefficients are assigned randomly, allowing us to evaluate the performance gain achieved through the optimization process.The random scheme is designed to highlight the significance of our proposed intelligent approach; this scheme incorporates random generation of IRS reflection coefficients using the random algorithm.This comparison aims to showcase the added value brought by the intelligent optimization process.The 5 G Communication standard, following the approach outlined in [26], is introduced as a benchmark.This comparison serves to access the performance of our proposed approach in the context of evolving communication standards, emphasizing its adaptability and superiority in contemporary communication scenarios.
Figure 3 provides a comprehensive assessment of the convergence behavior of our intelligent resource allocation scheme for semantic spectrum sharing communication networks based on the integration of D3QN and TD3 across episodes.Two distinct scenarios are considered in this evaluation: one with E = 64 IRS reflective elements and another with E = 128 IRS reflective elements.The maximum transmit power of the SBS is set to 20 dBm .Notably, the proposed resource allocation approach consistently dem- onstrates a progressive improvement in rewards, reaching a fast convergence and showcasing substantial performance enhancements.When E = 128 and E = 64, the algorithm can reach the convergence around 2 × 10 2 due to the fact that our proposed framework is high-efficient in addressing the optimization problem.This observation serves as robust validation for the efficacy of our intelligent resource allocation strategy.It is noteworthy that achieving convergence in scenarios with E = 128 presents a more challeng- ing task compared to the E = 64 scenario, given the heightened complexity associated with a larger number of IRS elements.It is evident that our proposed approach adeptly solves the challenges linked to a high-dimensional adjustment of IRS reflective elements, effectively navigating the exploration of optimal solutions.This effectiveness is attributed to the powerful exploration ability inherent in our proposed scheme, where the Q value of the actions can be evaluated accurately.
In Fig. 4, we present a comprehensive comparison of the achievable S-SE in the secondary network using proposed resource allocation approach against several benchmark schemes across varying numbers of IRS reflective elements.The results clearly demonstrate a significant improvement of S-SE as the number of IRS reflective elements increases.This improvement can be directly attributed to the growing number of IRS reflective elements, which significantly improves the beamforming accuracy and signal gain.Consequently, it is easy to make a pronounced enhancement in S-SE, showcasing a significant advantage over schemes that lack IRS integration.Furthermore, our intricately crafted intelligent approach demonstrates the ability to effectively leverage the advantages provided by a substantial number of IRS elements, consistently yielding exceptional performance.In contrast, the fixed IRS scheme displays suboptimal performance, as their incapacity to dynamically adjust IRS array elements impedes their effectiveness in adapting to proposed semantic spectrum sharing network.Figure 4 vividly illustrates the advantageous impact of leveraging IRS to augment S-SE.
In Fig. 5, we depict the achievable S-SE of the secondary network under varying transforming factors, drawing a comparison with the conventional communication scheme.Notably, semantic communication demonstrates a consistent and stable S-SE performance versus transforming factors, in stark contrast to the declining trend observed in the S-SE of conventional communication.This is because of the fundamental nature of semantic Fig. 3 The convergence performance of our proposed for semantic spectrum sharing semantic communication networks Fig. 4 The S-SE of the secondary network versus different number of IRS reflective elements communication, which strives to extract and transmit the important information.Importantly, the IRS-assisted semantic communication exhibits remarkable S-SE performance when compared to conventional communication standards.This is due to the inherent capability of IRS-enhanced communication to further enhance resource utilization efficiency, leading to significant gains in S-SE.The effectiveness of semantic communication schemes assisted by IRS is underscored, especially in scenarios with limited transforming factors.

Conclusion
We addressed the critical resource allocation challenges where IRS is employed for semantic spectrum sharing.The objective is to ensure QoS for primary network while simultaneously maximizing gains in the secondary network.To achieve this, we jointly optimize semantic symbol assignment, subchannel allocation, transmit beamforming of the SBS, and IRS reflective elements to maximize the S-SE of the secondary network.In order to enhance computational efficiency and intelligence in resource allocation, we introduce an intelligent hybrid method based on D3QN-TD3 to solve the non-convex optimization problem.Specifically, the D3QN component is responsible for determining semantic symbol and subchannel allocation, while the TD3 component focuses on optimizing the transmit beamforming of the SBS and IRS reflective elements.Simulation results validate the effectiveness of our DRL-based resource allocation approach, demonstrating better S-SE performance compared to benchmark schemes.
mission efficiency.Specifically, the different DeepSC networks can produce the semantic information with different length.Let O = {1, . . ., O} be the set of different DeepSC set.The encoding process for the input sequence s involves the semantic encoder, yield- ing semantic information P through the expression P = E ̺ o (s) , where E ̺ o (•) represents the encoder o characterized by parameters ̺ o .This encoded information, denoted as X = C α (P) , is then passed through channel encoder C α (•) , defined by parameter α .At the receiver, the received signal is expressed as Y = hX + n , where h represents channel coefficients and n denotes the background noise.Decoding of the received symbols is carried out by the channel decoder, represented as X ′ = C −1 β (Y ) .Finally, semantic infor- mation X ′ is reconstructed at the receiver through the decoder, denoted as m = S −1 σ o (X ′ ) , where S −1 σ o (•) signifies the inverse operation of the semantic decoder with parameters σ o .Let ζ k,o denote whether user k select o-th DeepSC.If user k select o-th DeepSC, then ζ k,o = 1 ; otherwise, ζ k,o = 0.

pd
be the transmit beamforming of the d-th PU and f s k be the transmit beamforming of the k-th SU.The transmission rate from PBS to d-th PU is denoted as where B denotes the bandwidth.The interference Ŵ p,d can be given by where h s,d denotes the subchannel from the PBS to the d-th user.The δ kd denotes whether d-th PU and k-th SU share the same channel.If the PU d and the SU k share the same subchannel, then δ kd = 1 ; otherwise, δ kd = 0 .The transmission rate from BS to k- th user is expressed by The interference Ŵ s k can be given by where h p,k is the subchannel from PBS to k-th user.

Fig. 2
Fig. 2 Proposed DRL-based resource allocation scheme for the IRS-enhanced semantic spectrum sharing communication

Fig. 5
Fig. 5 The S-SE versus different transforming factors