Skip to main content
Top
Published in: EURASIP Journal on Wireless Communications and Networking 1/2020

Open Access 01-12-2020 | Research

Cache-enabled physical-layer secure game against smart uAV-assisted attacks in b5G NOMA networks

Authors: Chao Li, Zihe Gao, Junjuan Xia, Dan Deng, Liseng Fan

Published in: EURASIP Journal on Wireless Communications and Networking | Issue 1/2020

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper investigates cache-enabled physical-layer secure communication in a no-orthogonal multiple access (NOMA) network with two users, where an intelligent unmanned aerial vehicle (UAV) is equipped with attack module which can perform as multiple attack modes. We present a power allocation strategy to enhance the transmission security. To this end, we propose an algorithm which can adaptively control the power allocation factor for the source station in NOMA network based on reinforcement learning. The interaction between the source station and UAV is regarded as a dynamic game. In the process of the game, the source station adjusts the power allocation factor appropriately according to the current work mode of the attack module on UAV. To maximize the benefit value, the source station keeps exploring the changing radio environment until the Nash equilibrium (NE) is reached. Moreover, the proof of the NE is given to verify the strategy we proposed is optimal. Simulation results prove the effectiveness of the strategy.
Notes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abbreviations
NE
Nash equilibrium
NOMA
non-orthogonal multiple access
UAV
unmanned aerial vehicle

1 Introduction

In recent years, ultra-reliable and low-latency have been a very important requirement for supporting the wireless services for the B5G wireless communications [14]. To support this requirement, caching technique can pre-store the wireless data during non-peak traffic time and hence reduce the load traffic significantly [58]. In addition, non-orthogonal multiple access (NOMA) can provide much higher capacity and spectrum efficiency than that of orthogonal multiple access, and hence, it is one of the most promising candidate for supporting ultra-reliable and low-latency services. Moreover, NOMA protocol enables the source station to allocate the same spectrum and time resource to multiple users with power-domain multiplexing. In particular, NOMA protocol can serve different kinds of users, and it can flexibly support ultra-reliable and low-latency services for both far and near users.
Although NOMA technology can provide a reliable performance in enhancing wireless transmission, its transmission security is threatened by the eavesdroppers due to the broadcasting nature of wireless communications [913]. The authors in [14] have studied the protection of physical-layer security and proposed strategies for wireless communication networks which have been confirmed to perform efficiently. In [15], the authors studied the antenna selection algorithm to protect physical-layer security in NOMA network with an eavesdropper. However, the conventional strategies for protecting the physical-layer security in NOMA system work well, only when the attacker just has one work mode. Intelligent attacker with multiple work mode is proposed in [1620] to reduce the data rate of communication systems by freely switching between eavesdropping, jamming, deception, and silent. If the networks continue to adopt the conventional strategies, the intelligent attacks will not be suppressed.
To tackle this problem, the authors in [2124] proposed a transmission policy based on reinforcement learning. As a special branch of artificial intelligence, the reinforcement learning proposed in [25] can be regarded as a Markov decision-making process. The agent trained by reinforcement learning can decide the action to be executed according to the environment state at the current moment, and maximize the long-term cumulative rewards to obtain the optimal action set. However, the state transition probability is generally unknowable for the agent. The Q-learning is proposed in [26] to solve the problem. Combining dynamic programming with the Monte Carlo method, Q-learning can make the agent learn optimal strategies without knowing the state transition probability. As far as we know, no previous work has used the Q-learning algorithm to protect secure transmission in the NOMA system, which is threatened by the intelligent attacker.
Due to mobility and ease of deployment, unmanned aerial vehicles (UAVs) have arisen as a new type of communication nodes in the wireless networks, for example, the UAVs can perform as a relay or base station under extreme natural conditions. However, a UAV can be a mobile intelligent attacker if it is equipped with attack module. In this paper, we investigate a NOMA network with two users in the presence of an UAV attacker which can execute multiple attack modes. The source station sends the composite signals to two users at the same time; therefore, the total transmit power is divided into two parts. We dynamically allocate the proportions of transmit power to confront the intelligent attacker. In the wireless communication process, it is hard to know the work mode transition probability of intelligent attacker. As a model-free learning method without depending on the state transition probability, the Q-learning is adopted to obtain a learning-based adaptive policy. Furthermore, we formulate the confrontation between the source station and intelligent attacker as a dynamic game, and we derive the Nash equilibrium (NE) of the dynamic game. Simulation results show that the strategy we proposed significantly improved the data rate of NOMA system.

2 Methods/experimental

Consider one cache-enabled source station S can pre-store a certain amount of information. There exists one cell-edge user U1 and one central user U2 in the coverage of S, where U2 is closer to S than U1. When the request signals from users are received, S transmits cached messages based on NOMA protocol to users. Furthermore, there exists a UAV which performs as an intelligent attacker E in this area. We suppose that the UAV is more likely to attack cell-edge user U1, and the UAV remains in the same position when attacking. Programmable radio equipment on E can flexibly select to overheard information from S, send jamming or deception signals to U1, or keep silent. We denote these four work modes of E as m=0,1,2, and 3, respectively. In the experiment, the purpose of E is to attempt to decrease the system data rate and reduce the correctness of user decoding. For simplicity, all the devices in this experiment are equipped with single antenna.

3 NOMA networks

Now, we depict the NOMA network system model which is shown in Fig. 1. We suppose that S transmits a composite signal consisting of x1 and x2, which contains messages requested by U1 and U2, respectively. According to NOMA protocol, S divides the total transmit power PS into two portions, i.e., αPS and βPS, where α and β are the power allocation factors for x1 and x2, respectively. In order to satisfy the requirements of different transmission distance, the two factors αPS and βPS have to meet the following constraint conditions:
$$ \left\{ \begin{array}{lr} {\alpha \gg \beta,} \\ {\alpha + \beta \leq 1.} \end{array} \right. $$
(1)
In order to fight against the intelligent UAV attacker E, S works on improving system data rate by consciously changing its power allocation factor α. For the first step of the transmission process, S chooses a value for the power allocation factor α to transmit the mixture signal x1,x2, and then, the received signal at U1 denoted by \(y_{_{U_{1}}}\) can be given as:
$$\begin{array}{*{20}l} y_{_{U_{1}}}=h_{SU_{1}}(\sqrt{\alpha P_{S}}x_{1} + \sqrt{\beta P_{S}}x_{2})+n_{_{U_{1}}} \end{array} $$
(2)
where \(h_{SU_{1}}\sim \mathcal {CN}(0,{\nu }^{2})\) is the instantaneous channel coefficient of SU1 link. \(n_{_{U_{1}}}{\sim }\mathcal {CN}(0,{\sigma }^{2})\) represents the additive white Gaussian noise (AWGN) received at U1 [2730]. The resultant SINR for x1 at U1 can be written as:
$$\begin{array}{*{20}l} {\text{SINR}}_{U_{1}}^{x_{1}} = \frac{\alpha P_{S}|h_{SU_{1}}|^{2}}{\beta P_{S}|h_{SU_{1}}|^{2}+ {\sigma}^{2}}. \end{array} $$
(3)
when m=0 holds, i.e., E shuts down radio equipment and stays silent. In this case, the achievable rates of x1 at U1 denoted by \(C_{_{U_{1}}}\) is exactly the system data rate Csys,0. Thus, the system data rate is acquired by [31]:
$$\begin{array}{*{20}l} C_{sys, 0} & = \log_{2}(1+ \frac{\alpha P_{S}|h_{SU_{1}}|^{2}}{\beta P_{S}|h_{SU_{1}}|^{2}+ {\sigma}^{2}}) \\ & = \log_{2}(1+\frac{\alpha{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+1}), \end{array} $$
(4)
where \(\widetilde {P}_{S} = P_{S}/{\sigma }^{2}\). When m=1 holds, E executes to overhear information from S; the received signal at E can be given as:
$$\begin{array}{*{20}l} y_{_{E}} = h_{SE}(\sqrt{\alpha P_{S}}x_{1} + \sqrt{\beta P_{S}}x_{2}) + n_{_{E}}, \end{array} $$
(5)
we assume that perfect SIC receiver is applied at E; thus, according to [32], the achievable rate of x1 at E denoted by \(C_{_{E}}\) can be written as:
$$\begin{array}{*{20}l} C_{_{E}} = \log_{2}(1+\frac{\alpha{\widetilde{P}_{S}}|h_{SE}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SE}|^{2}+1}), \end{array} $$
(6)
where \(h_{SE}{\sim }\mathcal {CN}(0,{\mu }^{2})\) is the instantaneous channel coefficient of SE link. \(n_{_{E}}{\sim }\mathcal {CN}(0,{\sigma }^{2})\) represents AWGN received at E. Consequently, according to [17], the system data rate Csys,1 can be computed by:
$$\begin{array}{*{20}l} C_{sys, 1} = [C_{sys, 0}-C_{_{E}}]^{+}, \end{array} $$
(7)
where [X]+ returns X if X is positive, while returns 0 otherwise. When m=2 holds, E selects to transmit a jamming signal to U1; the received signal \(y_{_{U_{1}}}\) at U1 can be acquired by:
$$\begin{array}{*{20}l} y_{_{U_{1}, J}}=\! h_{SU_{1}}(\sqrt{\alpha P_{S}}x_{1}\,+\, \sqrt{\beta P_{S}}x_{2})+ h_{EU_{1}}\sqrt{P_{J}}x_{_{J}} \,+\, n_{_{U_{1}}} \end{array} $$
(8)
where \(h_{EU_{1}}{\sim }\mathcal {CN} (0, {\lambda }^{2})\) is the instantaneous channel coefficient of EU1 link. PJ is the jamming power of E, and \(x_{_{J}}\) represents the jamming signal transmitted by E. Therefore, in this case, the system data rate Csys,2 can be computed by:
$$\begin{array}{*{20}l} C_{sys, 2}=\log{2}(1+\frac{{\alpha}{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+{\widetilde{P}_{J}}|h_{EU_{1}}|^{2}+1}) \end{array} $$
(9)
where \(\widetilde {P}_{J} = P_{J}/{\sigma }^{2}\). When m=3 holds, S does not send signal to U1 while E transmits the deception signal \(x_{_{D}}\). The received signal at U1 becomes:
$$\begin{array}{*{20}l} y_{_{U_{1}, D}}=h_{EU_{1}}\sqrt{P_{D}}x_{_{D}}+n_{_{U_{1}}}, \end{array} $$
(10)
where PD is the deception power. The increase of the deception signal received by U1 is bound to cause more loss in the achievable rate at U1. Thus, the system data rate Csys,3 can be formulated as a linear function and given by:
$$\begin{array}{*{20}l} C_{sys, 3}=C_{sys, 0}-{\gamma}\log_{2}(1+{\widetilde{P}_{D}} |h_{EU_{1}}|^{2}), \end{array} $$
(11)
where \(\widetilde {P}_{D}=P_{J}/{\sigma }^{2}\). γ∈(0,1) is the deception factor which quantifies the probability of the influence of each deception signal.

4 Secure game in NOMA network

The interaction between S and E in the NOMA network performs in a rivalry way, which is formulated as a secure game. To discuss the process of the secure game, we need to first quantify the variety range of α. While ensuring that U1 can decode the received information correctly, we must also ensure that U2 can correctly decode x2. We denote the minimum data rate requirement for U1 and U2 as \(C_{\min }^{U_{1}}\) and \(C_{\min }^{U_{2}}\). Thus, α and β satisfy the following constraint:
$$\begin{array}{*{20}l} &\log_{2}(1+ \frac{\alpha {\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+1})\geq C_{\min}^{U_{1}}, \end{array} $$
(12)
$$\begin{array}{*{20}l} &\log_{2}(1+{\beta}{\widetilde{P}_{S}}|h_{SU_{2}}|^{2})\geq C_{\min}^{U_{2}}, \end{array} $$
(13)
according to (1), the threshold value of α is given by:
$$ \left\{ \begin{array}{lr} {\alpha_{\max} = 1-\frac{2^{C_{\min}^{U_{2}}}-1}{{\widetilde{P}_{S}}|h_{SU_{2}}|^{2}},} \\ {\alpha_{\min} = \frac{(2^{C_{\min}^{U_{1}}}-1)({\beta}{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+1)}{{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}.} \end{array} \right. $$
(14)
where αmax and αmin are the maximum power allocation factor for x1. We now turn to discuss the process of the secure game. S is adaptively adjusting its power allocation factor in the range of [αmin,αmax], while E selects to execute an attack modes m∈{0,1,2,3}, which represents keeping silent, eavesdropping, jamming, or deception, respectively. In each time slot, E attempts to reduce the system data rate, i.e., Csys,1,Csys,2, or Csys,3. S devotes to increase the system data rate by controlling α and meanwhile suppressing the probability of attacking. In view of this, we regard the confrontation between S and E as a zero-sum game. Depending on the system data rate and power consumption, the reward function of S denoted by RS in the zero-sum game is formulated as:
$$\begin{array}{*{20}l} R_{S}(\alpha, m)=\ln2 C_{sys, m}- \alpha{\theta}, \end{array} $$
(15)
where θ is the total power consumption. We introduce coefficient ln2 to simplify the subsequent derivation process. According to the distinguishing feature of zero-sum game, the reward function of E denoted by RE is defined as:
$$\begin{array}{*{20}l} R_{E}(\alpha, m)=-\ln2 C_{sys, m}- \varphi_{m}, \end{array} $$
(16)
where φm=0,1,2,3 denotes the consumption of E in mode m. In the secure game, S tries to find an optimal power allocation factor in [αmin,αmax] to maximize RS, and E is dynamically adjusting its work modes to maximize RE. The purpose of the game between S and E is to achieve their own optimal strategies α and m, respectively. Then, we define the set of strategies {α,m} as the Nash equilibrium (NE) of the secure game, where S and E gain the maximize reward value. Thus, the NE strategy is given by:
$$\begin{array}{*{20}l} & R_{S}(\alpha^{*}, m^{*}) \geq R_{S}(\alpha, m^{*}), \end{array} $$
(17)
$$\begin{array}{*{20}l} & R_{E}(\alpha^{*}, m^{*}) \geq R_{E}(\alpha^{*}, m). \end{array} $$
(18)
Through analytical derivation, we obtain one NE solution {α,0}. That is to say, if S keeps choosing a power allocation factor α, E will obtain the maximized reward value by keeping silent, and it has no motivation to execute any attack modes. Specifically, the NE solution is given and proved in the following Lemma 1 and Proof.
Lemma 1
: The secure game in the NOMA network has one NE solution {α,0}, which is acquired by
$$\begin{array}{*{20}l} {\alpha}^{*}=\frac{\widetilde{P}_{S}|h_{SU_{1}}|^{2}-\theta}{\widetilde{P}_{S}|h_{SU_{1}}|^{2}\theta}-\beta \qquad \alpha_{\min} < {\alpha}^{*} \leq \alpha_{\max}. \end{array} $$
(19)
if the following constraints are met:
$$\begin{array}{*{20}l} &\frac{\widetilde{P}_{S}|h_{SU_{1}}|^{2}}{({\alpha_{\max}}\!\,+\,\!\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}\!\,+\,\!1}\!\! < \theta < \!\! \frac{\widetilde{P}_{S}|h_{SU_{1}}|^{2}}{({\alpha_{\min}}\!\,+\,\!\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}\!\,+\,\!1}, \end{array} $$
(20a)
$$\begin{array}{*{20}l} &\varphi_{1} \geq \ln(1+\frac{\alpha^{*}{\widetilde{P}_{S}}|h_{SE}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SE}|^{2}+1}), \end{array} $$
(20b)
$$\begin{array}{*{20}l} &\varphi_{2} \geq \ln \end{array} $$
(20c)
$$\begin{array}{*{20}l} &\quad-\ln(1+\frac{{\alpha^{*}}{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+{\widetilde{P}_{J}}|h_{EU_{1}}|^{2}+1}), \end{array} $$
(20d)
$$\begin{array}{*{20}l} &\varphi_{3} \geq \gamma\ln(1+{\widetilde{P}_{D}} |h_{EU_{1}}|^{2}). \end{array} $$
(20e)
Proof
The proof of this Lemma is given in the Appendix

5 NOMA power allocation algorithm

In order to suppress the attack probability efficiently in the secure game, S must adopt appropriate power allocation strategy. However, because of the complexity and variability of radio signals in the NOMA network, S can barely predict the channel state information and the work modes of E. For this reason, we propose a power allocation algorithm based on Q-learning. By incorporating the Monte Carlo and dynamic programming methods, Q-learning is regarded as one of the most effective algorithms in model-free reinforcement learning. Without knowing the state of the environment and its transition probability, the agent is constantly exploring the environment and making trial-and-error experiments. After many independent repetitive experiments and the average is obtained, the Q-learning-based agent will acquire the optimal strategy.
Based on above ideas, we propose the power allocation algorithm of NOMA for the secure game. In consideration of the inherent relation between S and E, the work mode of E determines the state of S; similarly, S can influence the environment of E by adjusting α. In the first step of the algorithm, we initialize the Q-table denoted by Q(m,α) which is used for updating the reward values of state-action pairs. For each experiment, E first selects a work mode randomly, which determines S to adopt an instantaneous αt accordingly, where αt denotes the power allocation factor at time t. It should be emphasized that we do not expect that S always selects the appropriate power allocation factor by searching in the Q-table. To avoid getting the local optimal solution, we use ε−greedy policy when S chooses a value of α. Specifically, S searches for the current optimal α in Q-table with probability ε, otherwise chooses a value in the range of [αmin,αmax] randomly. At this time slot, S transmits a signal with power αtPS and computes the system data rate as reward value RS from the environment. Then, E changes the work mode from m to mt+1 according to the system data rate. By incorporating the instantaneous reward value RS and the accumulated experience in Q-table, the update process of Q-table presented by the authors in [33] can be formulated as:
$$\begin{array}{*{20}l} Q(m_{t}, \alpha_{t}){\leftarrow}Q(m_{t}, \alpha_{t})&{+}\zeta[R_{S}\\ &{+}\rho \max Q(m_{t+1}, \alpha)-Q(m_{t}, \alpha_{t})], \end{array} $$
(21)
where ζ∈(0,1] is the parameter to control the rate of learning. ρ∈[0,1] represents the proportion of accumulated experience. To solve the problem of not knowing the state transition probability, we repeat the experiment multiple times and compute the average reward value. After enough updates and repeated experiments, the Q-table converges to be optimal. From the optimal Q-table, S can obtain a learning-based optimal power allocation strategy. Algorithm 1 describes the learning process:
https://static-content.springer.com/image/art%3A10.1186%2Fs13638-019-1595-x/MediaObjects/13638_2019_1595_Figa_HTML.png

6 Results and discussion

In this section, we simulate the communication process to verify the effectiveness of the proposed algorithms. The links in the network experience the Rayleigh flat fading [3437], and the nodes are equipped with a single antenna. We set the parameter as follows: \(\{\nu ^{2}, \mu ^{2}, \lambda ^{2}\} = \{1.2, 0.5, 2\}, \varphi _{m=\{0, 1, 2, 3\}}=\{0, 1.8, 2.0, 2.1\}, \gamma = 0.6, \widetilde {P}_{J} = 2, \widetilde {P}_{D} = 2.1\). We set the power allocation factor α to vary from 0.6 to 0.9 with a change interval of 0.02, and β is set to a constant value 0.1. Specifically, we set 10,000 time slots for each experiment, and then, we repeat 5000 experiments to find the average.
Figure 2 reflects the variation of the average reward value of S and E from 0 to 10,000 time slots. From this figure, we can see that the average reward value of S and E both increases rapidly between 0 and 1000 time slots. In the subsequent process, the two curves rise slowly and reach their peak value at 3000 time slot point, respectively. Then, the two curves remain steady until the terminal of the experiment. In the learning-based algorithms, we expect agents to select specific actions to improve their long-term cumulative rewards, which is consistent with the experimental results.
The purpose of our proposed power allocation strategy is to improve the average data rate of the system, which is well reflected in Fig. 3. From 0 to 1000 time slot, the average system data rate dramatically grows from the initial value 0.76 to a temporary value 1.23. After that, the average system data rate continues to rise slowly until it converges to 1.31 at 3000 time slot point, and then keeps a steady level from 3000 to the terminal. The change trend of system data rate is basically consistent with the average reward value, which also proves that the increase of system data rate will bring more rewards to agents.
Figure 4 shows a dynamic programming process of average power allocation factor in the reinforcement learning process. As can be seen from the figure, the power allocation factor has a random initial value of 0.75. After the start of the experiment, the work mode of E begins to change, and S dynamically adjusts the power allocation factor according to the environment transformation. In the first 500 time slots, the average power allocation factor gradually decreases to a temporary value of 0.708. Between 500 and 4000, the average power allocation gradually increases and then remains stable around 0.737.
Figure 5 indicates the average attack probabilities of E versus the time slot varying from 0 to 10,000. We find that the average attack probabilities fall quickly from 0 to 1000. After that, the three curves decrease slowly and tend to converge gradually. The probability of eavesdropping drops from the initial value of 0.25 to the convergence value of 0.025, and the decline rate reaches 90%. The probability of jamming drops from the initial value of 0.26 to the convergence value of 0.02, and the decline rate is 92.3%. Similarly, the probability of deception drops from the initial value of 0.27 to the convergence value of 0.01; therefore, the decline rate is 96.2%. What is more, we simulate the average attack probabilities of the power allocation algorithm again with different parameters. We set the channel parameters as {ν2,μ2,λ2}={0.9,0.3,2}. That is to say, we assume that the cell-edge user u1 is placed further away from S. Correspondingly, E is also further away from S. Compared with Fig. 5, Fig. 6 shows that the converged eavesdropping probability becomes lower; at the same time, the converged deception and jamming probabilities grow up 2% with the condition that the jamming and deception power are fixed. Alignment of Fig. 5 with Fig. 6 can find that the proposed policy performs well regardless of the location of cell-edge user and UAV.

7 Conclusions

In this paper, we investigated the cache-assisted physical-layer security of a NOMA communication network where there exists an intelligent attacker UAV nearby the cell-edge user. The UAV within the coverage of the network tries to reduce the system data rate of the NOMA network by flexibly switching a work mode among eavesdropping, jamming, deception, and keep silence. According to the NOMA protocol, the transmitter in the system has to allocate the total power to two users in a certain proportion. In that way, we need an immediate strategy to adjust the power allocation factor to suppress the attack motivation of the UAV. To tackle this problem, we proposed the power allocation strategy based on Q-learning to control the power allocation factor. From the simulation results, we can see that the proposed strategy can well adjust the power allocation factor in real time. Furthermore, we confirmed that this strategy has excellent performance in enhancing the system data rate and suppressing the attack probabilities. In the future works, we will apply the wireless caching technique[3840] to the NOMA systems to further enhance the transmission reliability and security. In addition, we will consider some new materials [4143] for enhancing the communication performance in the practical applications. Furthermore, some intelligent algorithms such as deep learning-based algorithms [4447] will be applied into the considered system, in order to further enhance the network performance.

8 Appendix

Proof
: By substituting m=0 into (15), we have
$$\begin{array}{*{20}l} R_{S}(\alpha, 0)=\ln(1+\frac{\alpha{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+1})-\alpha \theta. \end{array} $$
(22)
We take the partial derivative of RS(α,0) with respect to α and have
$$\begin{array}{*{20}l} \frac{\partial{R_{S} (\alpha, 0)}}{\partial{\alpha}}=\frac{{\widetilde{P}_{S} |h_{SU_{1}}|^{2}}}{(\alpha+\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}+1} - \theta, \end{array} $$
(23)
by making further derivative, easy to find
$$ \frac{\partial{R_{S}^{2} (\alpha, 0)}}{\partial{\alpha}^{2}}=-\frac{{\widetilde{P}_{S}^{2} |h_{SU_{1}}|^{4}}}{[(\alpha+\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}+1]^{2}} \leq 0, $$
(24)
showing that (22) is a convex function, i.e., RS(α,0)/α=0. So we substitute α=α into (23); thus, (19) holds on. To ensure that (23) acquires the maximum in the range of [αmin,αmax], let the following inequalities hold:
$$\begin{array}{*{20}l} &\frac{\partial{R_{S} (\alpha, 0)}}{\partial{\alpha}}|_{\alpha={\alpha_{\min}}}\,=\,\frac{{\widetilde{P}_{S} |h_{SU_{1}}|^{2}}}{(\alpha_{\min}+\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}+1}\! - \theta\!>\!\!0, \end{array} $$
(25)
$$\begin{array}{*{20}l} &\frac{\partial{R_{S} (\alpha, 0)}}{\partial{\alpha}}\!|_{\alpha=\!{\alpha_{\max}}}\,=\,\frac{{\widetilde{P}_{S} |h_{SU_{1}}|^{2}}}{(\alpha_{\max}+\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}+1}\! - \!\theta\!<\!\!0, \end{array} $$
(26)
i.e., (20a) holds. Therefore, (α,0) satisfies (17). To ensure that (α,0) satisfies (18), by substituting ((α,0)) into (16), we let the following inequalities hold:
$$\begin{array}{*{20}l} &R_{E}(\alpha^{*}, 0)-R_{E}(\alpha^{*}, 1) \geq 0, \end{array} $$
(27a)
$$\begin{array}{*{20}l} &R_{E}(\alpha^{*}, 0)-R_{E}(\alpha^{*}, 2) \geq 0, \end{array} $$
(27b)
$$\begin{array}{*{20}l} &R_{E}(\alpha^{*}, 0)-R_{E}(\alpha^{*}, 3) \geq 0, \end{array} $$
(27c)
i.e., (20b)–(20d) hold. Therefore, (α,0) also satisfies (18).
Above all, we prove the set of strategy (α,0) meanwhile satisfies Eqs. (17) and (18), which is the strict definition of NE. With this, Lemma 1 is completely proved. □

Acknowledgements

Not applicable.

Competing interests

The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literature
1.
go back to reference J. Zhao, Q. Li, Y. Gong, Computation offloading and resource allocation for mobile edge computing with multiple access points. IET Commun.PP(99), 1–10 (2019). J. Zhao, Q. Li, Y. Gong, Computation offloading and resource allocation for mobile edge computing with multiple access points. IET Commun.PP(99), 1–10 (2019).
2.
go back to reference J. Yang, D. Ruan, J. Huang, X. Kang, Y. -Q. Shi, An embedding cost learning framework using gan. IEEE Trans. Inf. Forensic. Secur. PP(99), 1–10 (2019).CrossRef J. Yang, D. Ruan, J. Huang, X. Kang, Y. -Q. Shi, An embedding cost learning framework using gan. IEEE Trans. Inf. Forensic. Secur. PP(99), 1–10 (2019).CrossRef
3.
go back to reference B. Wang, F. Gao, S. Jin, H. Lin, G. Y. Li, Spatial- and frequency-wideband effects in millimeter-wave massive MIMO systems. IEEE Trans. Sig. Processing. 66(13), 3393–3406 (2018).MathSciNetMATHCrossRef B. Wang, F. Gao, S. Jin, H. Lin, G. Y. Li, Spatial- and frequency-wideband effects in millimeter-wave massive MIMO systems. IEEE Trans. Sig. Processing. 66(13), 3393–3406 (2018).MathSciNetMATHCrossRef
4.
go back to reference X. Hu, C. Zhong, X. Chen, W. Xu, Z. Zhang, Cluster grouping and power control for angle-domain mmwave mimo noma systems. IEEE J Sel. Top. Sig. Process.13(5), 1167–1180 (2019).CrossRef X. Hu, C. Zhong, X. Chen, W. Xu, Z. Zhang, Cluster grouping and power control for angle-domain mmwave mimo noma systems. IEEE J Sel. Top. Sig. Process.13(5), 1167–1180 (2019).CrossRef
5.
go back to reference L. Fan, N. Zhao, X. Lei, Q. Chen, N. Yang, G. K. Karagiannidis, Outage probability and optimal cache placement for multiple amplify-and-forward relay networks. IEEE Trans. Veh. Technol.67(12), 12373–12378 (2018).CrossRef L. Fan, N. Zhao, X. Lei, Q. Chen, N. Yang, G. K. Karagiannidis, Outage probability and optimal cache placement for multiple amplify-and-forward relay networks. IEEE Trans. Veh. Technol.67(12), 12373–12378 (2018).CrossRef
6.
go back to reference X. Lin, Probabilistic caching placement in uav-assisted heterogeneous wireless networks. Phys. Commun.33:, 54–61 (2019).CrossRef X. Lin, Probabilistic caching placement in uav-assisted heterogeneous wireless networks. Phys. Commun.33:, 54–61 (2019).CrossRef
7.
go back to reference F. Shi, Secure probabilistic caching in random multi-user multi-uav relay networks. Phys. Commun.32:, 31–40 (2019).CrossRef F. Shi, Secure probabilistic caching in random multi-user multi-uav relay networks. Phys. Commun.32:, 31–40 (2019).CrossRef
8.
go back to reference C. Li, L. Peng, Z. Chao, S. Fan, J. Cioffi, L. Yang, Spectral-efficient cellular communications with coexistent one- and two-hop transmissions. IEEE Trans. Veh. Technol.65(8), 6765–6772 (2016).CrossRef C. Li, L. Peng, Z. Chao, S. Fan, J. Cioffi, L. Yang, Spectral-efficient cellular communications with coexistent one- and two-hop transmissions. IEEE Trans. Veh. Technol.65(8), 6765–6772 (2016).CrossRef
9.
go back to reference G. Gomez, F. J. Martin-Vega, F. J. Lopez-Martinez, Y. Liu, M. Elkashlan, G. Gomez, F. J. Martin-Vega, F. J. Lopez-Martinez, Y. Liu, M. Elkashlan, Uplink noma in large-scale systems: Coverage and physical layer security. CoRR. abs/1709.04693: (2017). G. Gomez, F. J. Martin-Vega, F. J. Lopez-Martinez, Y. Liu, M. Elkashlan, G. Gomez, F. J. Martin-Vega, F. J. Lopez-Martinez, Y. Liu, M. Elkashlan, Uplink noma in large-scale systems: Coverage and physical layer security. CoRR. abs/1709.04693: (2017).
10.
go back to reference C. Zheng, H. Xin, X. Guo, T. Ristaniemi, H. Zhu, Secure and energy efficient resource allocation for wireless power enabled full-/half-duplex multiple-antenna relay systems. IEEE Trans. Veh. Technol.65(12), 11208–11219 (2017). C. Zheng, H. Xin, X. Guo, T. Ristaniemi, H. Zhu, Secure and energy efficient resource allocation for wireless power enabled full-/half-duplex multiple-antenna relay systems. IEEE Trans. Veh. Technol.65(12), 11208–11219 (2017).
11.
go back to reference X. Liang, C. Xie, M. Min, W. Zhuang, User-centric view of unmanned aerial vehicle transmission against smart attacks. IEEE Trans. Veh. Technol.67(4), 3420–3430 (2017). X. Liang, C. Xie, M. Min, W. Zhuang, User-centric view of unmanned aerial vehicle transmission against smart attacks. IEEE Trans. Veh. Technol.67(4), 3420–3430 (2017).
12.
go back to reference C. Li, S. Zhang, P. Liu, F. Sun, J. Cioffi, L. Yang, Overhearing protocol design exploiting inter-cell interference in cooperative greennetworks. IEEE Trans. Veh. Technol.65(1), 441–446 (2016).CrossRef C. Li, S. Zhang, P. Liu, F. Sun, J. Cioffi, L. Yang, Overhearing protocol design exploiting inter-cell interference in cooperative greennetworks. IEEE Trans. Veh. Technol.65(1), 441–446 (2016).CrossRef
13.
go back to reference C. Li, H. J. Yang, S. Fan, J. Cioffi, L. Yang, Multi-user overhearing for cooperative two-way multi-antenna relays. IEEE Trans. Veh. Technol.65(5), 3796–3802 (2016).CrossRef C. Li, H. J. Yang, S. Fan, J. Cioffi, L. Yang, Multi-user overhearing for cooperative two-way multi-antenna relays. IEEE Trans. Veh. Technol.65(5), 3796–3802 (2016).CrossRef
14.
go back to reference J. Xia, Secure cache-aided multi-relay networks in the presence of multiple eavesdroppers. IEEE Trans. Commun.PP(99), 1–10 (2019).CrossRef J. Xia, Secure cache-aided multi-relay networks in the presence of multiple eavesdroppers. IEEE Trans. Commun.PP(99), 1–10 (2019).CrossRef
15.
go back to reference C. Zheng, L. Lei, H. Zhang, T. Ristaniemi, H. Zhu, Energy-efficient and secure resource allocation for multiple-antenna noma with wireless power transfer. IEEE Trans. Green Commun. Netw.2(4), 1059–1071 (2018).CrossRef C. Zheng, L. Lei, H. Zhang, T. Ristaniemi, H. Zhu, Energy-efficient and secure resource allocation for multiple-antenna noma with wireless power transfer. IEEE Trans. Green Commun. Netw.2(4), 1059–1071 (2018).CrossRef
16.
go back to reference Y. Li, L. Xiao, H. Dai, P. H. Vincent, in IEEE Int. Conf. Commun.Game theoretic study of protecting mimo transmissions against smart attacks, (2017), pp. 1–6. Y. Li, L. Xiao, H. Dai, P. H. Vincent, in IEEE Int. Conf. Commun.Game theoretic study of protecting mimo transmissions against smart attacks, (2017), pp. 1–6.
17.
go back to reference C. Li, Y. Xu, Protecting secure communication under UAV smart attack with imperfect channel estimation. IEEE Access. 6(1), 76395–76401 (2018).CrossRef C. Li, Y. Xu, Protecting secure communication under UAV smart attack with imperfect channel estimation. IEEE Access. 6(1), 76395–76401 (2018).CrossRef
18.
go back to reference Y. Xu, Q-learning based physical-layer secure game against multi-agent attacks. IEEE Access. 7:, 49212–49222 (2019).CrossRef Y. Xu, Q-learning based physical-layer secure game against multi-agent attacks. IEEE Access. 7:, 49212–49222 (2019).CrossRef
19.
go back to reference X. Liang, Y. Li, G. Han, H. Dai, H. V. Poor, A secure mobile crowdsensing game with deep reinforcement learning. IEEE Trans. Inf. Forensic. Secur. 13(1), 35–47 (2018).CrossRef X. Liang, Y. Li, G. Han, H. Dai, H. V. Poor, A secure mobile crowdsensing game with deep reinforcement learning. IEEE Trans. Inf. Forensic. Secur. 13(1), 35–47 (2018).CrossRef
20.
go back to reference C. Li, S. Fan, J. M. Cioffi, L. Yang, Energy efficient mimo relay transmissions via joint power allocations. IEEE Trans. Circ. Syst. II Express Briefs. 61(7), 531–535 (2014). C. Li, S. Fan, J. M. Cioffi, L. Yang, Energy efficient mimo relay transmissions via joint power allocations. IEEE Trans. Circ. Syst. II Express Briefs. 61(7), 531–535 (2014).
21.
go back to reference X. Liang, C. Xie, et al., A mobile offloading game against smart attacks. IEEE Access. 4:, 2281–2291 (2017). X. Liang, C. Xie, et al., A mobile offloading game against smart attacks. IEEE Access. 4:, 2281–2291 (2017).
22.
go back to reference C. Li, W. Zhou, Enhanced secure transmission against intelligent attacks. IEEE Access. 7:, 53596–53602 (2019).CrossRef C. Li, W. Zhou, Enhanced secure transmission against intelligent attacks. IEEE Access. 7:, 53596–53602 (2019).CrossRef
23.
go back to reference X. Liang, T. Chen, C. Xie, H. Dai, V. Poor, Mobile crowdsensing games in vehicular networks. IEEE Trans. Veh. Technol.67(2), 1535–1545 (2018).CrossRef X. Liang, T. Chen, C. Xie, H. Dai, V. Poor, Mobile crowdsensing games in vehicular networks. IEEE Trans. Veh. Technol.67(2), 1535–1545 (2018).CrossRef
24.
go back to reference X. Liang, Y. Li, C. Dai, H. Dai, H. V. Poor, Reinforcement learning-based noma power allocation in the presence of smart jamming. IEEE Trans. Veh. Technol.67(4), 3377–3389 (2018).CrossRef X. Liang, Y. Li, C. Dai, H. Dai, H. V. Poor, Reinforcement learning-based noma power allocation in the presence of smart jamming. IEEE Trans. Veh. Technol.67(4), 3377–3389 (2018).CrossRef
25.
go back to reference A. G. Barto, Reinforcement learning. A Bradford Book. 15(7), 665–685 (1998). A. G. Barto, Reinforcement learning. A Bradford Book. 15(7), 665–685 (1998).
26.
go back to reference C. J. C. H. Watkins, P. Dayan, Technical note: Q-learning. Mach. Learn.8(3-4), 279–292 (1992).MATHCrossRef C. J. C. H. Watkins, P. Dayan, Technical note: Q-learning. Mach. Learn.8(3-4), 279–292 (1992).MATHCrossRef
27.
go back to reference X. Lin, MARL-based distributed cache placement for wireless networks. IEEE Access. 7:, 62606–62615 (2019).CrossRef X. Lin, MARL-based distributed cache placement for wireless networks. IEEE Access. 7:, 62606–62615 (2019).CrossRef
28.
go back to reference J. Zhao, A dual-link soft handover scheme for C/U plane split network in high-speed railway. IEEE Access. 6:, 12473–12482 (2018).CrossRef J. Zhao, A dual-link soft handover scheme for C/U plane split network in high-speed railway. IEEE Access. 6:, 12473–12482 (2018).CrossRef
29.
go back to reference H. Xie, F. Gao, S. Zhang, S. Jin, A unified transmission strategy for TDD/FDD massive MIMO systems with spatial basis expansion model. IEEE Trans. Veh. Technol.66(4), 3170–3184 (2017).CrossRef H. Xie, F. Gao, S. Zhang, S. Jin, A unified transmission strategy for TDD/FDD massive MIMO systems with spatial basis expansion model. IEEE Trans. Veh. Technol.66(4), 3170–3184 (2017).CrossRef
30.
go back to reference X. Lai, Distributed secure switch-and-stay combining over correlated fading channels. IEEE Trans. Inf. Forensic. Secur. 14(8), 2088–2101 (2019).CrossRef X. Lai, Distributed secure switch-and-stay combining over correlated fading channels. IEEE Trans. Inf. Forensic. Secur. 14(8), 2088–2101 (2019).CrossRef
31.
go back to reference Z. Na, Y. Wang, Subcarrier allocation based simultaneous wireless information and power transfer algorithm in 5g cooperative OFDM communication systems. Phys. Commun.29:, 164–170 (2018).CrossRef Z. Na, Y. Wang, Subcarrier allocation based simultaneous wireless information and power transfer algorithm in 5g cooperative OFDM communication systems. Phys. Commun.29:, 164–170 (2018).CrossRef
33.
go back to reference E. N. Barron, H. Ishii, The bellman equation for minimizing the maximum cost. Nonlinear Anal. Theory Methods Appl.13(9), 1067–1090 (1989).MathSciNetMATHCrossRef E. N. Barron, H. Ishii, The bellman equation for minimizing the maximum cost. Nonlinear Anal. Theory Methods Appl.13(9), 1067–1090 (1989).MathSciNetMATHCrossRef
34.
go back to reference Z. Na, J. Lv, M. Zhang, M. Xiong, GFDM based wireless powered communication for cooperative relay system. IEEE Access. 7:, 50971–50979 (2019).CrossRef Z. Na, J. Lv, M. Zhang, M. Xiong, GFDM based wireless powered communication for cooperative relay system. IEEE Access. 7:, 50971–50979 (2019).CrossRef
35.
go back to reference X. Lai, W. Zou, DF relaying networks with randomly distributed interferers. IEEE Access. 5:, 18909–18917 (2017).CrossRef X. Lai, W. Zou, DF relaying networks with randomly distributed interferers. IEEE Access. 5:, 18909–18917 (2017).CrossRef
36.
go back to reference J. Zhao, J. Liu, Y. Nie, S. Ni, Location-assisted beam alignment for train-to-train communication in urban rail transit system. IEEE Access. 7:, 80133–80145 (2019).CrossRef J. Zhao, J. Liu, Y. Nie, S. Ni, Location-assisted beam alignment for train-to-train communication in urban rail transit system. IEEE Access. 7:, 80133–80145 (2019).CrossRef
37.
go back to reference J. Xia, Cache-aided mobile edge computing for b5g wireless communication networks. EURASIP J. Wirel. Commun. Netw.PP(99), 1–5 (2019). J. Xia, Cache-aided mobile edge computing for b5g wireless communication networks. EURASIP J. Wirel. Commun. Netw.PP(99), 1–5 (2019).
38.
go back to reference J. Xia, When distributed switch-and-stay combining meets buffer in IoT relaying networks. Phys. Commun.PP:, 1–9 (2019). J. Xia, When distributed switch-and-stay combining meets buffer in IoT relaying networks. Phys. Commun.PP:, 1–9 (2019).
39.
go back to reference S. Lai, Intelligent secure communication for cognitive networks with multiple primary transmit power. IEEE Access. PP(99), 1–7 (2019). S. Lai, Intelligent secure communication for cognitive networks with multiple primary transmit power. IEEE Access. PP(99), 1–7 (2019).
40.
go back to reference J. Zhao, Q. Li, Y. Gong, K. Zhang, Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks. IEEE Trans. Veh. Technol.68(8), 7944–7956 (2019).CrossRef J. Zhao, Q. Li, Y. Gong, K. Zhang, Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks. IEEE Trans. Veh. Technol.68(8), 7944–7956 (2019).CrossRef
41.
go back to reference J. Yang, Inverse optimization of building thermal resistance and capacitance for minimizing air conditioning loads. Renew. Energy. PP:, 1–10 (2020). J. Yang, Inverse optimization of building thermal resistance and capacitance for minimizing air conditioning loads. Renew. Energy. PP:, 1–10 (2020).
42.
go back to reference H. Huang, Optimum insulation thicknesses and energy conservation of building thermal insulation materials in chinese zone of humid subtropical climate. Renew. Energy. 52:, 101840 (2020). H. Huang, Optimum insulation thicknesses and energy conservation of building thermal insulation materials in chinese zone of humid subtropical climate. Renew. Energy. 52:, 101840 (2020).
43.
go back to reference J. Yang, Numerical and experimental study on the thermal performance of aerogel insulating panels for building energy efficiency. Renew. Energy. 138:, 445–457 (2019).CrossRef J. Yang, Numerical and experimental study on the thermal performance of aerogel insulating panels for building energy efficiency. Renew. Energy. 138:, 445–457 (2019).CrossRef
44.
go back to reference G. Liu, Deep learning based channel prediction for edge computing networks towards intelligent connected vehicles. IEEE Access. 7:, 114487–114495 (2019).CrossRef G. Liu, Deep learning based channel prediction for edge computing networks towards intelligent connected vehicles. IEEE Access. 7:, 114487–114495 (2019).CrossRef
45.
go back to reference Z. Zhao, A novel framework of three-hierarchical offloading optimization for mec in industrial IoT networks. IEEE Trans. Ind. Inform.PP(99), 1–12 (2019). Z. Zhao, A novel framework of three-hierarchical offloading optimization for mec in industrial IoT networks. IEEE Trans. Ind. Inform.PP(99), 1–12 (2019).
46.
go back to reference J. Xia, Intelligent secure communication for internet of things with statistical channel state information of attacker. IEEE Access. 7(1), 144481–144488 (2019).CrossRef J. Xia, Intelligent secure communication for internet of things with statistical channel state information of attacker. IEEE Access. 7(1), 144481–144488 (2019).CrossRef
47.
go back to reference K. He, A MIMO detector with deep learning in the presence of correlated interference. IEEE Trans. Veh. Technol.PP(99), 1–5 (2019). K. He, A MIMO detector with deep learning in the presence of correlated interference. IEEE Trans. Veh. Technol.PP(99), 1–5 (2019).
Metadata
Title
Cache-enabled physical-layer secure game against smart uAV-assisted attacks in b5G NOMA networks
Authors
Chao Li
Zihe Gao
Junjuan Xia
Dan Deng
Liseng Fan
Publication date
01-12-2020
Publisher
Springer International Publishing
DOI
https://doi.org/10.1186/s13638-019-1595-x

Other articles of this Issue 1/2020

EURASIP Journal on Wireless Communications and Networking 1/2020 Go to the issue

Premium Partner