We propose a study using hidden Markov model (HMM) with state prediction for opportunistic spectrum access (OSA) in cognitive radio (CR) networks. The primary channels are assumed to be operating in a TDMA manner with synchronous time slots of equal length and alternating between idle and busy states. The secondary users (SUs) may use them when they are idle by channel sensing. In contrast to the traditional scheme relying only on channel sensing for exploring spectrum opportunities, the proposed prediction scheme takes advantage of state prediction, channel sensing, and acknowledgments (ACKs) from the receiver in an attempt to maximize the utility. In the prediction scheme, there are three distinct actions: direct skip, sensing then conditional access, and direct access. We impose some constraints on the system parameters and derive thresholds by which we can specify the optimal action. We then conduct simulations to compare the performance of the prediction scheme to that of the traditional scheme. Results show that the former is superior to the latter. We believe the proposed prediction scheme is suitable for the OSA system using spectrum sensing.
Hinweise
Competing interests
The authors declare that they have no competing interests.
1 Introduction
The spectrum resources are rare, and a large part of the spectrum licensed for various applications are not fully used in times or in spaces. These facts motivate the use of cognitive radio (CR) networks coexisting with the licensed primary networks [1,2]. In CR networks, the secondary users (SUs) are allowed to utilize the frequency channels assigned for the primary users (PUs) when they are currently not being used. This kind of spectrum sharing is referred to as opportunistic spectrum access (OSA) or spectrum overlay. To detect the spectrum opportunities, spectrum sensing may be used. In [3-6], the authors consider the OSA system in which an ad hoc CR network operating with a primary network for which N frequency channels are assigned. The channels are assumed to be operating in a TDMA manner with synchronous time slots of equal length and alternating between idle and busy states. The SU may use them when they are idle by channel sensing.
To effectively implement the concept of CR networking and alleviate the processing delays, spectrum prediction technique has been employed in the functions of spectrum sensing, spectrum decision, and spectrum mobility [7-10]. With spectrum prediction, SUs may skip the sensing duty on channels that are predicted to be busy or directly access when channels are predicted to be idle. In this way, sensing time and energy consumption may be reduced.
Anzeige
CR networks can operate in either centralized or distributed mode [11]. In the centralized mode, a central controller is implemented to perform spectrum decisions and resource allocation. In contrast to the centralized mode requiring a central controller, the distributed mode does not rely on any central controller. In this mode, SUs may cooperate by exchanging their own local information to enhance the sensing performance or operate independently to pursue their own benefits [12].
Many researchers have used partially observable Markov decision process (POMDP) to model distributed OSA networks [3-6]. The idea to apply the POMDP model arises from the considerations that states of the system cannot be totally observed. Optimal action is determined by using the partial information and the history of observations. Assuming that the channel statistics are the same within T time periods, the optimal value functions for each system state are solved for a T-finite horizon optimal problem in a recursive manner. The problem is quite involved because there are a total of 2N system states [4,5]. In a POMDP model, the actions taken by the agents control the state transitions of the hidden process. Different actions taken in a time change the future behavior of the process, so we have to solve the problem based on finite or infinite horizon.
In this work, we consider the OSA system with distributed mode. We assume that channels are statistically independent and focus on the treatment for a single channel. With imperfect sensing and by considering the fact that actions taken by SUs do not affect the evolution of the channel state, we thus model the system as a hidden Markov model (HMM) [13-20]. An HMM can be considered as consisting two processes: the variation of the hidden states is a Markov process and the observation under a specific hidden state is a random process. This model is suitable for modeling Markov process with imperfect sensing. Operating in this model, SUs try to track the state of the hidden process and use the channel when idle state is detected. As actions do not control the hidden process, the current action does not affect the value function in the future. Thus, in finding optimal policy, no recursive procedures are required. Using the HMM in conjunction with state prediction, we derive thresholds on the probability space for the action policy, which specifies the optimal action achieving maximum expected utility based on the predicted probability.
The rest of this paper is organized as follows. In Section 2, we give a description of the system. Section 2 describes actions and derives the decision rule. Simulation results are given in Section 2 and finally we draw conclusions in Section 2.
Anzeige
2 System description
The same OSA system as studied in [3-6] is considered in this work. In the OSA system we assume that a dedicated control channel is available for SUs in the system to exchange signaling messages, as suggested in [21]. We also assume each primary channel is characterized by a two-state Markov chain. As SUs cannot know the channel state exactly, we further model each channel as an HMM, which may be viewed as a discrete-time bi-variable random process {S(t), O(t)}, where t=1,2, ⋯, is the discrete time, S(t) is the hidden process, and O(t) is the observable process having states as the hiddenprocess.
For the HMM under considerations, we make some assumptions stating as follows [14]. The first one is the Markov property, which means the next state depends only on the current state and independent of previous states. The second one is the stationary property, which states that the transition probabilities are independent of the actual time the transitions take places. The third one is the observation independence assumption; that is, the current observation is statistically independent of previous observations.
The hidden process S(t) of a channel is characterized by two transition probabilities α and β, where α is the probability of transition from idle state (s0) to busy state (s1) and β is the probability of transition from s1 to s0. The observable process also has idle and busy states. However, due to imperfect sensing, observations may be incorrect. When a busy state is observed, denoted by o1, if the hidden process is actually idle, we say a false alarm occurs and denote this probability by pfa; otherwise we have a correct detection, denoting this probability by pd. Thus, (1−pd) is the miss detection probability. Perfect sensing means pd=1 and pfa=0, but it is extremely hard to achieve in practice. A spectrum opportunity detector can be considered as performing a binary hypotheses test. As can be seen from a receiver operating characteristic curve, a smaller value of pfa implies a larger value of (1−pd). This results in more packet collisions. Contrarily, a larger pfa implies a smaller (1−pd) but a large part of the unused channel resources will be wasted. In general, we would like to make pd as large as possible and pfa as small as possible. However, these are usually conflicting objectives. To solve the problem, we may resort to Neyman-Pearson criterion [22]; that is, determining a value of pfa which is acceptable and seeking a decision strategy on the channel energy that constrains pfa to this value while simultaneously maximizing pd.
Similar to [6,16], we assume energy detection is used for spectrum sensing. Consider two binary hypotheses H0 and H1, where H0 and H1 denote the absence and presence of a primary user, respectively. A test statistic
is formulated to distinguish between H0 and H1 by a predefined threshold λ as follows:
Spectrum sensing is equivalent to detecting channel energy. The threshold λ is determined by the constraint on pfa and reflects the effect of noise uncertainty on spectrum sensing. For the observable process, we may first give a constraint on pfa and then determine pd. Thus values of pd and pfa can be known [8,19]. The state-transition diagram of a hidden process is shown in Figure 1, in which the observable process is also shown.
×
3 Actions and decision rule
In the traditional scheme for the OSA system, at the beginning of a time slot, the SU first senses the channel. If the outcome is idle, the SU then transmits over the channel; otherwise it skips the rest of the slot. Thus, no attempts are made to track the state of the channel. In this work, we propose to use state prediction in conjunction with channel sensing. In contrast to the traditional scheme always performing sensing in each slot, the prediction scheme may skip the slot or directly transmit without sensing, so as to avoid sensing as much as possible. As the concept of Bayesian filters plays a vital role in the proposed scheme, before going into further details, we first describe the concept behind the Bayesian filter.
3.1 Bayesian filter
Let π(t)= [ π0(t) π1(t)] be the belief probability vector of a channel at time t, where πi(t) is the probability of the hidden process staying in state si. A Bayesian filter may be used to update current belief given previous belief π(t−1) and current observation om, that is,
where pij denotes the transition probability, from state si to state sj, of the hidden process and p(om|sk) is the probability of the occurrence of observation om, given the process staying in state sk. Let \(\hat {\pi }_{j} (t)\) denote the one-step ahead prediction probability of the hidden process being in state sj, expressed by
These equations show that an update of the Bayesian filter may be viewed as composed of two major steps. The first step is to predict the one-step ahead probability vector by the transition probabilities and the previous belief vector, as in (3). The second step is to update current belief by the predicted probability vector and the observation, as in (4). However, obtaining observations requires channel sensing and doing so consumes energy. If the hidden process is predicted to be idle with high probability, one may avoid channel sensing and directly transmit. Similarly, if the channel is predicted to be busy with high probability, channel sensing may be avoided as well. These considerations are the motivation of this work.
3.2 Actions and probability update
In contrast to the traditional scheme relying only on channel sensing, the proposed prediction scheme takes advantage of state prediction, channel sensing, and acknowledgments (ACKs) from the receiver in an attempt to maximize the utility. Consider a particular channel for an SU. Let b(t) and \(\hat {b}(t)\) denote the belief probability and the prediction probability, respectively, of the channel being in idle state at time slot t. As in (3), the SU may predict \(\hat {b}(t)\) by
After the prediction, if \(\hat {b}(t)\) is quite low, the SU directly skips the time slot for energy savings and then sets \(b(t) = \hat {b}(t)\) as no observations are available. This action is referred to as direct skip, denoted by a1. When \(\hat {b}(t)\) is quite high, the SU directly transmits over the channel. This action is referred to as direct access, denoted by a3. The belief update for this action depends on the transmission result and will be discussed later. When \(\hat {b}(t)\) is moderate, the SU first performs channel sensing. If the outcome is o1, the SU skips the rest of the slot and updates current belief from \(\hat {b}(t)\) by
The update in (6) tends to lower the prediction probability in the next slot and in turn helps induce action a1 to be chosen because when \(\hat {b}(t)\) is moderate, in the denominator, the first term may be quite small compared to the second term. If the sensing outcome is o0, the SU then transmits over the channel. This action is referred to as sensing then conditional access, denoted by a2, which is also the only action in the traditional scheme.
In this work, as similar to [6], we assume that at the end of each time slot, the corresponding receiver will send an ACK, through the dedicated control channel, to inform the SU whether the transmission is successful. Assume also that packets involved in a collision are collapsed; that is, capture effect is ignored. Upon receiving the ACK after a transmission by action a2 or a3, the SU updates the belief as follows. If a positive ACK is received, this means the channel is definitely in idle state, so the SU sets current belief to b(t)=1. When a negative ACK (NACK) is received (timeout is equivalent to NACK) after taking action a2, as already having an observation o0, the SU first updates the belief by
Note that if the channel is assumed to be error-free, the SU sets b(t)=0 when a NACK is received. In fact, both updates in (8) and (9) may be as small as zero because the error rate usually is quite small compared to the other quantities.
3.3 Formulation of decision rule
Before we can formulate the decision rule, we have to specify the utility for each action. In the OSA system, we assume the SU can earn positive rewards only from successful transmissions. Suppose the SU has to pay a cost for a transmission, as doing so consuming energy. If the transmission is successful, the SU receives a reward. When the transmission collides with a PU’s transmission, the SU has to pay a cost for the collision but no further cost is paid when collapsed by the channel noise. As the transmission cost itself is a constant quantity and the channel error rate plays its role as a weighting factor on the utilities, both of them may be incorporated into the reward for a successful transmission, Rsucc, and the cost for a collision, Ccoll. Channel sensing consumes energy, so each time the SU senses a channel is assumed to be charged for a cost Csens. In addition, we assume that Ccoll>Rsucc>Csens.
For clarity, in the following expressions, we may omit the time variable t on \(\hat {b}(t)\) and b(t) if no ambiguity in time instants. After the prediction probability update described in (5), if the SU takes action a1, we assume that the SU gets no rewards and pays nothing. So the utility function is
$$ U_{a_{1}} (\hat {b}) = 0. $$
(10)
The utility function for action a3 may be expressed as
In order to compare the utility of action a2 with others, we have to transform the belief b in (12) and (13) into \(\hat {b}.\) Let \(b = T(b\vert \hat {b},o_{i})\) be the belief update given \(\hat {b}\) and observation oi. Thus, as in (4), we have
By (10), (11), and (16), we make the following observations. The utility functions are all linear over the whole probability space. For any given set of parameters, we can specify a distinct interval in the probability space for each action which maximizes the utility. However, for real applications, we have the following concerns. At the extreme point \(\hat {b} = 1\), we have
Thus, \(U_{a_{3}} (1)\) is definitely larger than both \(U_{a_{2}} (1)\) and \(U_{a_{1}} (1)\). In addition, we desire that \(U_{a_{2}} (1) > U_{a_{1}} (1)\). To meet this condition, we may set pfa<1−(Csens/Rsucc). This constraint is reasonable because a smaller pfa is desirable. At the other extreme point \(\hat {b} = 0\), we find
Clearly, \( U_{a_{1}} (0)\) is the maximum one. In order to have \(U_{a_{3}} (0) < U_{a_{2}} (0)\), we require Ccoll(1−pd)+Csens<Ccoll. A setting of pd>Csens/Ccoll satisfies the inequality. We find this constraint reasonable as a higher pd is desirable. With the above treatments, at \(\hat {b} = 1\), we have \(U_{a_{3}} (\hat {b}) > U_{a_{2}} (\hat {b}) > U_{a_{1}} (\hat {b})\) and at \(\hat {b} = 0\), we have \(U_{a_{3}} (\hat {b}) < U_{a_{2}} (\hat {b}) < U_{a_{1}} (\hat {b})\). Therefore, \(U_{a_{3}} (\hat {b}),_{ }U_{a_{2}} (\hat {b})\), and \(U_{a_{1}} (\hat {b})\) are not parallel to each other and any two lines intersect at one point. For illustrating purposes, we in the rest of this paper focus on this scenario. However, the idea applies to other scenarios as well.
Denote the prediction probability \(\hat {b}\) at which \(U_{a_{3}} (\hat {b})\) and \(U_{a_{2}} (\hat {b})\) intersect by η1 and \(U_{a_{2}} (\hat {b})\) and \(U_{a_{1}} (\hat {b})\) intersect by η0. Obviously, depending on the parameters, we have either η1>η0 or η1≤η0. By equating (10) and (16) and solving for \(\hat {b},\) we obtain
We show an illustrating example of the utility functions under this condition in Figure 2. If η1≤η0, we have to specify another threshold η2, denoting the \(\hat {b}\) at which \(U_{a_{1}} (\hat {b})\) and \(U_{a_{3}} (\hat {b})\) intersect, that is,
Figure 3 shows an illustrating example of the utility functions of which η1<η0. Note that if η1≤η0, action a2 is excluded because \(U_{a_{2}} (\hat {b})\) is dominated by either \(U_{a_{3}} (\hat {b})\) or \(U_{a_{1}} (\hat {b})\).
×
If channels are statistically independent, then extension to the case of multiple channels is straightforward. Suppose the SU is capable of treating a number of channels up to l. If l≥N, all the N channels can be treated simultaneously; otherwise the first l channels having the largest utility values are chosen with potential ties being broken randomly.
3.4 Effects of transition probabilities
For the prediction scheme, when updating from a given b(t)=0, we prefer that the SU first chooses action a1 a number of times and then chooses action a2. On the other hand, when updating from a given b(t)=1, we prefer the SU performs action a3 consecutively before receiving a NACK. In each update, the prediction probability \(\hat {b}(t)\) depends heavily on the transition probabilities. We thus examine the effects of the latter on the proposed scheme to see under what conditions would the preferences be satisfied. Suppose action a1 is consecutively taken n times starting with a given b(t). With some algebra manipulations, we may express the n-step ahead prediction probability \(\hat {b}(t + n)\) as
From this equation we make some observations, distinguished by different cases, as follows.
Case α+β<1: For this case, if b(t)<β/(α+β), the last two terms on the right-hand side of (24) sum to a negative value, which increases toward zero as n increases; that is, \(\hat {b}(t + n)\) approaches the steady-state probability β/(α+β) as n increases. As a consequence, if η0>β/(α+β), action a1 will be taken forever. Thus, we desire η0<β/(α+β). On the other hand, if b(t)>β/(α+β), the two mentioned terms sum to a positive value, which approaches zero as n increases. In addition, \(\hat {b}(t + 1)=\beta \) if b(t)=0 and \(\hat {b}(t + 1)=1- \alpha \) if b(t)=1. Clearly, for any given b(t), values of \(\hat {b}(t + n)\) are limited to [ β,1−α]. Hence, for action a1 (resp. action a3) to be active, we require η0>β (resp. η1<1−α).
Case α+β=1: For this case, \(\hat {b}(t + n)=\beta /(\alpha +\beta)\), for n=1,2,⋯, irrespective of b(t). Thus, the prediction scheme degenerates, as only one action is available. If the action for this case happens to be action a2, the prediction scheme degenerates into the traditional scheme.
Case α+β>1: For this case, if b(t) is below β/(α+β), \(\hat {b}(t + 1)\) achieves the maximum value, which is above β/(α+β), in the sequence of \(\hat {b}(t + n)\). Contrarily, if b(t) is above β/(α+β), \(\hat {b}(t + 1)\) reaches the minimum value, which is below β/(α+β), in the sequence of \(\hat {b}(t + n)\). As the rate of change of the state is fast, the actions in adjacent time slots tend to be different so as to follow the channel state. Performing the same action a1 (or a3) in consecutive time slots is seldom.
From these observations, we recognize that the case α+β<1 with
may fit the preferences described above. Under this condition, the prediction scheme definitely outperforms the traditional scheme. However, values of α and β are given by PUs and the SU has no choices. But it is clear that the prediction scheme does at least as well as the traditional scheme as long as utilities are of concerns.
4 Simulation results
In this section, we compare the performance of the prediction scheme to that of the traditional scheme by simulations. Before conducting simulations, we have to set system parameters appropriately so that the imposed constraints are met. With this consideration in mind, we first set Ccoll=4, Csucc=3, and Csens=0.4. For the given values, the constraints imposed in (17) and (18) are pfa<0.866 and pd>0.1, respectively. These constraints are not stringent.
First, we numerically computed the thresholds. In Figure 4, we show the effects of pd on η0 and η1 for pfa=0.1 and 0.2. From this figure, we observe that for all parameters considered, η1 is larger than η0. This figure also shows that as pd increases, η1 slightly increases and η0 decreases. This change gives the SU more chances to perform action a2 as the channel sensing becomes more reliable. In addition, we can also examine the effects of the transition probabilities on the prediction scheme. For example, when pd=0.9 and pfa=0.1, we find η0=0.25 and η1=0.82. Suppose α=β=0.1. It is easy to check that the condition in (25) is met. In Figure 5, we show the threshold values as a function of pfa. As can be seen from the figure that we have η1>η0 again for all parameters considered. In addition, as pfa increases, η0 increases while η1 decreases. This tendency is opposite to that shown in Figure 4. Thus, the probability of performing action a2 decreases as the channel sensing becomes more unreliable.
×
×
With the thresholds numerically determined, we then conducted simulations to evaluate the utilities of the proposed prediction scheme and the traditional scheme. In each run, a total of 106 time slots were simulated for each scheme and the utilities obtained were averaged. In Figure 6, we depict the utility values achieved in error-free channels as a function of pd, where Up denotes utility of the prediction scheme and Ut the traditional scheme. We see from this figure that when pd≤0.6, Up is zero and Ut is negative. Indeed, when the environment is too hostile, Ut may become negative, but Up≥0 for all situations. When pd≥0.65, both utilities are larger than zero and the proposed scheme has a gain roughly a magnitude of Csens relative to the traditional scheme. In addition, we also simulated both schemes with channel errors. However, for reasonable error rates, such as 10−4 or less, the degradation due to channel errors is not significant, several orders less than the utilities achieved by either scheme. Thus, we show the results for error-free channels only.
×
Figure 7 depicts the utility values as a function of pfa. We observe from this figure that the effects of pfa on the proposed scheme is not significant, but it is on the traditional scheme. This dissimilarity is mainly due to the fact that when \(\hat {b}\) is high, the proposed scheme may perform action a3 without sensing while the traditional scheme always senses. A higher pfa may cause the traditional scheme to waste more idle slots. Finally, we show the effects of β on utility values in Figure 8. As β increases, the channel will stay in idle state more often so that the SU has more chances to utilize the channel. Thus, we can see from this figure that as β increases, both Up and Ut increase. Moreover, the superiority of the prediction scheme can also be observed.
×
×
5 Conclusions
We proposed a study using HMM with state prediction for spectrum sensing in OSA networks. The SU may exercise one of the three distinct actions: direct skip, sense then conditional access, or direct access. We imposed some constraints on the system parameters and derived thresholds by which we can specify the optimal action which maximizes the utility. We compared the performance of the prediction scheme to that of the traditional scheme. Simulation results show that as trying to avoid sensing as much as possible, the prediction scheme may earn a gain roughly a magnitude of the sensing cost relative to the traditional scheme. Simulation results also show that when the false alarm probability is high, the performance gain can be significant. This suggests that using the prediction scheme instead of relying only on sensing results can alleviate the detriment arising from unreliable sensing, as prediction results become more reliable. With channel errors taken into account, we observe from the simulation results that the effect of channel errors on the superiority of the proposed scheme over the traditional scheme is not significant. This means that superiority exists equally in any kind of channels. Thus, we believe the proposed scheme is suitable for OSA systems using spectrum sensing.
Acknowledgements
The research is supported by the Ministry of Science and Technology of the Republic of China under Grant No. MOST 103-2221-E-468-005.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Competing interests
The authors declare that they have no competing interests.