1 Introduction
2 System model
2.1 Secondary transmission pair
-
Soft constraint: the secondary transmitter senses sub-channel n with probability ρ n (t) at time slot t (note that ρ n (t) may be time-varying) and the decisions of selection are mutually independent across different sub-channels. Then, we have$$\begin{array}{@{}rcl@{}} {\lim}_{T\rightarrow \infty}\frac{1}{T}\sum\limits_{t=1}^{T} \sum\limits_{n=1}^{N} \rho_{n}(t)=\frac{N'}{N}, \end{array} $$(1)×where N′ is the average number of sensed sub-channels in one time slot.
-
Hard constraint: the secondary transmitter can sense only exactly N′ sub-channels; therefore, the decisions on different sub-channels are mutually correlated. We denote by \(\rho ^{t}_{O}\) the probability that the subset of sub-channels \(\phantom {\dot {i}\!}O=\left \{i_{1},..., i_{N'}\right \}\) are sensed at time slot t.
2.2 Input and output alphabets
2.3 Input policy
2.4 Models of sub-channel occupancy
-
Memoryless model: the occupancy of primary user over a sub-channel is an i.i.d. random process; we denote by \(q_{n}^{I}\) the probability that sub-channel n is idle in each time slot.
-
Two-state model: the occupancy of primary user over a sub-channel is a two-state (B for busy and I for idle) Markov process, which is illustrated in Fig. 2b; the state of sub-channel n at time slot t is denoted by S n (t); we denote by \(q^{BI}_{n}\) (\(q^{IB}_{n}\)) the transition probability that the sub-channel n transits from state busy (idle) to state idle (busy); obviously, the memoryless model is a special case of the two-state model when \(q^{BI}_{n}+q^{IB}_{n}=1\) and \(q^{I}_{n}=q^{BI}_{n}\). We can put the transition probabilities into one matrix, which is given by$$\begin{array}{@{}rcl@{}} \mathcal{Q}_{n}=\left(\begin{array}{cc} 1-q_{n}^{IB} & q_{n}^{BI} \\ q_{n}^{IB} & 1-q_{n}^{BI} \\ \end{array} \right). \end{array} $$(2)
3 Memoryless sub-channels
3.1 Soft constraint
-
When \(\sum _{n=1}^{N}\rho _{n}^{*}=N'\), there exists a λ≤0 such that$$\begin{array}{@{}rcl@{}} \rho_{n}^{*}=\left\{ \begin{array}{ll} \frac{1}{g_{n}},&\qquad \mathrm{if }\ g_{n}>1\\ 1,&\qquad \mathrm{if }\ g_{n}<1 \end{array} \right., \end{array} $$(5)where$$\begin{array}{@{}rcl@{}} g_{n}=q_{n}^{I}\left(2^{-\frac{\lambda}{q_{n}^{I}}-I_{n,\max}}+1\right). \end{array} $$(6)
-
When \(\sum _{n=1}^{N}\rho _{n}^{*}<N'\), \(\rho _{n}^{*}\) is given by$$\begin{array}{@{}rcl@{}} \rho_{n}^{*}=\left\{ \begin{array}{ll} \frac{1}{q_{n}^{I}\left(2^{-I_{n,\max}}+1\right)}, \mathrm{if }\ I_{n,\max}\leq -\log_{2}\left(\frac{1}{q_{n}^{I}}-1\right)\\ 1,\mathrm{if }\ I_{n,\max}> -\log_{2}\left(\frac{1}{q_{n}^{I}}-1\right) \end{array} \right.,\\ \end{array} $$(7)
3.2 Hard constraint
4 Finite-state Markov sub-channels
4.1 Belief states
4.2 Average award
4.3 Countable state space
4.4 Myopic strategy
5 Numerical results
5.1 Simulation setup
Channel number | 20 |
Capacity index In,max | From 1 to 20 |
Sensing channel number N′ | 2 |
State transition probability | Setups 1 to 3 |
5.2 Memoryless channels
5.2.1 Soft constraint
5.2.2 Hard constraint
5.3 Markov channels
6 Conclusions and open problems
-
Extension from single-user to multi-user situation, in which the capacity region needs to be studied.
-
Considering the sensor error, or modeling the sub-channels as hidden Markov models (HMMs).
-
Considering the case of limited receiving capability, i.e., the receiver can receive over a subset of sub-channels, for which game theory needs to be applied since the decision concerns two players.
7 Appendix A: Decomposition of mutual information
8 Appendix B: Proof of Prop. 1
9 Appendix C: Proof of Prop. 2
10 Appendix D: Proof of Lemma 1
11 Appendix E: Markov control uncountable state space
-
State space S is a Borel space (defined as a Borel subset of a complete separable metric space);
-
Action space A is also a Borel space; each state s in the state space S is associated with a non-empty measurable subset A(s), whose elements are legal actions for state s; we assume that state-action pair set \(\mathbf {K}\triangleq \left \{(s,a)|s\in \mathbf {S}, a\in \mathbf {A}\right \}\) is measurable;
-
T is the transition law, whose elements are denoted by T(B|k), where \(B\in \mathcal {B}(\mathbf {A})\) and k∈K;
-
r is the reward function mapping from K to \(\mathbb {R}\).
-
For each state s∈S, A(s) is a non-empty compact subset of A;
-
The reward function r(x,a) is bounded and continuous for a∈A(s);
-
\(\int g(y)\mathbf {T}(dy|s,a)\) is a continuous function in a∈A(s) for each s∈S and for each function g∈B(S).
-
If the Ergodicity Assumption holds, then, for any arbitrary policy a, there exists an invariant measure p a , i.e. the unique invariant probability measure satisfying$$\begin{array}{@{}rcl@{}} p_{\mathbf{a}}(B)=\int_{\mathbf{S}}\mathbf{T}_{f}(B|s)p_{f}(dx), \end{array} $$(60)for all \(B\in \mathcal {B}(S)\), and the average reward function J(f,a) is a constant J(f) (thus being independent of initial state s), which is given by$$\begin{array}{@{}rcl@{}} J(f)=\int r(y,\mathbf{a}(y))p_{\mathbf{a}}(dy), \end{array} $$(61)
-
If both the Regularity Assumption and Ergodicity Assumption hold, then there exists a constant J∗ and a function h∗ in B(S) satisfying the following Optimality Equation, ∀s∈S,$$\begin{array}{@{}rcl@{}} {}J^{*}+h^{*}(s)=\max_{a\in A(s)}\left\{r(s,a)+\int_{\mathbf{S}}h^{*}(y)\mathbf{T}(dy|s,a)\right\}. \end{array} $$(62)
-
Consider a Markov policy {a t } such that it maximizes the right hand side of the following equation:$$\begin{array}{@{}rcl@{}} h^{*}_{t}(s)=\max_{a\in\mathbf{A}(s)}\left\{r(s,a)+\int h_{t-1}(y)\mathbf{T}(dy|s,a)\right\}, \end{array} $$(63)where h0∈B(S) is arbitrary, i.e.$$\begin{array}{@{}rcl@{}} h_{t}(s)=r(s,\mathbf{a}_{t}(s))+\int h_{t-1}(y)\mathbf{T}(dy|s,f_{t}(s)), \end{array} $$(64)then, the policy using a t at time slot t is optimal.
12 Appendix F: Proof of Lemma 3
-
For each state π, the corresponding set of action a(π) is a point in [ ε,1] N , which is compact;
-
From (30), the reward function r(s,a) is bounded by$$\begin{array}{@{}rcl@{}} &&\left|r(\mathbf{a}(t),\pi(t))\right|\\ &\leq&\sum\limits_{n=1}^{N}H(Y_{n}(t)|\mu_{n}(t))\\ &\leq&\sum\limits_{n=1}^{N}I_{n,\max}+n. \end{array} $$(65)
-
Note that the action space A(s) for state s∈S is the hyper-rectangle [ ε,1] N . Consider two policies a and a′ corresponding to state s=π and ∥a−a′∥=δf. If \(\pi _{n}=\left (\mathcal {Q}_{n}^{r}\right)_{11}\) (sub-channel n is sensed r time slots ago and is found idle), we have$$\begin{array}{@{}rcl@{}} P\left(\pi_{n}(t+1)=\left(\mathcal{Q}_{n}^{r+1}\right)_{11}\right)=1-\mathbf{a}_{n}(\pi), \end{array} $$(66)and$$\begin{array}{@{}rcl@{}} P\left(\pi_{n}(t+1)=\left(\mathcal{Q}_{n}^{1}\right)_{11}\right)=\mathbf{a}_{n}(\pi)\pi_{n}(t), \end{array} $$(67)and$$\begin{array}{@{}rcl@{}} P\left(\pi_{n}(t+1)=\left(\mathcal{Q}_{n}^{1}\right)_{12}\right)=\mathbf{a}_{n}(\pi)(1-\pi_{n}(t)). \end{array} $$(68)The change of the probability is of order O(δf). It is easy to verify the same conclusion for the cases \(\pi _{n}(t)=\left (\mathcal {Q}_{n}^{r}\right)_{12}\) and \(\pi _{n}(t)=\left (\mathcal {Q}_{n}^{t}\right)_{11}\pi _{n}(0) +\left (\mathcal {Q}_{n}^{t}\right)_{12}(1-\pi _{n}(0))\). Therefore,$$\begin{array}{@{}rcl@{}} |\mathbf{T}(dy|\mathbf{a},s)-\mathbf{T}(dy|\mathbf{a}',s)|=O(\delta f). \end{array} $$(69)Then, we obtain the continuity of \(\int g(y)\mathbf {T}(dy|s,a)\) using the assumption that g is a bounded function.