01.12.2015  Research  Ausgabe 1/2015 Open Access
Mobility management in HetNets: a learningbased perspective
Wichtige Hinweise
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The general ideas presented in this article have been developed by all the authors at high level. MS developed the algorithms presented, implemented them into the system level simulator, and ran simulations. The paper has been revised collaboratively by all authors. All authors read and approved the final manuscript.
1 Introduction
To cope with the wireless traffic demand within the next decade, operators are underlaying their macrocellular networks with lowpower base stations (BSs) [
1]. Such networks are typically referred as heterogeneous networks (HetNets), and their deployment entails a number of challenges in terms of capacity, coverage, mobility management (MM), and mobility load balancing across multiple network tiers [
2]. Mobility management, in particular, is essential to ensure a continuous connectivity to mobile user equipment (UE) while maintaining satisfactory quality of service (QoS). Therefore, poor mobility management may lead to handover failures (HOFs), radio link failures, as well as unnecessary handovers, typically referred as pingpong (PP) events. Such deficiencies result in low resource utilization efficiency and poor user experience. In order to solve such problems, mobility parameters in each cell need to be dynamically optimized according to cell traffic loads, coverage areas of different cells, and velocities of the UE.
In the advent of HetNets, recent studies have demonstrated that HOFs and PPs can be serious problems due small cell sizes [
2,
3]. MM mechanisms, which have been included in the first release of the longterm evolution (LTE) standard (Rel8), were originally developed for networks that only involve macrocells [
4]. The defined MM procedures for the macrocellonly scenarios have been widely discussed in the literature, e.g., in [
5
19]. It has been shown that MM for macrocellonly scenarios yield highly reliable handover execution, where HOFs and PPs can be typically avoided due to large cell sizes [
20]. However, the deployment of a large number of small cells (e.g., femtocells, picocells, etc.) increases the complexity of MM in HetNets, since mobile UE may trigger frequent handovers when they traverse the coverage area of a small cell. This leads to less reliable handover execution in HetNets.
Anzeige
While MM carries critical importance for HetNets to minimize HOFs and PPs, mobility load balancing is also crucial to achieve load balancing among different network tiers. In HetNets, the load among tiers is unbalanced due to significant differences in transmit power levels. UE tend to connect to the macrocell even when the path loss conditions between the small cell and the UE are better, because the handover decision is based on the highest reference signal received power (RSRP) measured at a UE [
21]. As a remedy to this, the 3rd Generation Partnership Project (3GPP) standardized the concept of
cell range expansion to
virtually increase a small cell’s coverage area by adding a bias value to its RSRP, which leads to traffic offloading from the macrocell. To enhance the overall system performance, not only cellspecific handover parameter optimization, such as the range expansion bias (REB) value adaptation, but also scheduling and resource allocation must be performed in a coordinated manner across different tiers. A survey of these and various other existing mobility management and load balancing approaches considering such aspects for small cells in LTEadvanced networks is provided in [
22].
In this article, a joint MM and UE scheduling approach is proposed using tools from reinforcement learning. The proposed MM approach utilizes parameter adaptation both in the long term and the short term. Hereby, macro and picocells learn how to optimize their longterm traffic load, whereas in the shortterm, the UE association process is carried out based on history and velocitybased scheduling. We propose multiarmed bandit (MAB) and satisfactionbased MM learning techniques as a longterm load balancing approach aiming at improving the overall system throughput while at the same time reducing the HOF and PP probabilities. A contextaware scheduler is proposed as a shortterm UE scheduling approach considering fairness.
The rest of the article is organized as follows. Section
3 provides a brief review about recent mobility management works in the literature and summarizes our contribution in this paper. Section
3 describes our system model. In Section
3, the problem formulation for MM is presented. In Section
3, the contextaware scheduler is described. In Section
3, we introduce our learning based MM approaches. Section
3 presents system level simulation results, and finally, Section
3 concludes the article.
2 Related work and contribution
In this section, we first summarize some of the recent studies on mobility management in LTEadvanced HetNets and use these to highlight the challenges and open problems that should be further addressed in the research community. Subsequently, a typical HetNet scenario with macro and picocells deployed in the same frequency band is considered to describe our contribution in this paper.
Anzeige
2.1 Literature review
The handover process requires a smooth transfer of a UE’s active connection when moving from one cell to another, while still maintaining the guaranteed QoS. The objective is to have mobility procedures resulting in low probability of experiencing radio link failures, HOFs, and PP events. Mobility solutions meeting those objectives are often said to be robust. The enhancement for handover robustness in HetNet LTE networks have been subject to recent interest. In LTE Rel11,
mobility enhancements in HetNets have been investigated through a dedicated study item [
2]. In this study item and cited work items therein, mobility performance enhancement solutions for cochannel HetNets are analyzed taking into account mobility robustness improvement. Proposed solutions are related to optimizing the handover procedure by dynamically adapting handover parameters for different cell sizes and UE velocities.
Mobility management techniques for HetNets have been recently investigated in the literature, e.g., [
23
28]. In [
23], the authors evaluate the effect of different combinations of various MM parameter settings for HetNets. The conclusions are aligned with the HetNet mobility performance evaluations in 3GPP [
2], i.e., HetNet mobility performance strongly depends on the cell size and the UE speed. The simulation results in [
23] consider that all UE have the same velocity in each simulation setup. Further results on the effects of MM parameters are presented in [
24], where the authors propose a fuzzylogicbased controller. This controller adaptively modifies handover parameters for handover optimization by considering the system load and UE speed in a macrocellonly network.
In [
25], the authors evaluate the mobility performance of HetNets considering almost blank subframes in the presence of cell range expansion and propose a mobility based intercell interference coordination scheme. Hereby, picocells configure coordinated resources by muting on certain subframes, so that macrocells can schedule their highvelocity UE in these resources which are free of cochannel interference from the picocells. However, the proposed approach only considers three broad classes of UE velocities: low, medium, and high. Moreover, no adaptation of the REB has been taken into account. In [
26], the authors propose a hybrid solution for HetNet MM, where MM in a macrocell is network controlled while UEautonomous in a small cell. In the scenario in [
26], macrocells and small cells operate on different component carriers which allows the MM to be maintained on the macrocell layer while enabling small cells to enhance the user plane capacity. In addition to these approaches, a fairnessbased MM solution is discussed in [
27]. Here, the cell selection problem in HetNets is formulated as a network wide proportional fairness optimization problem by jointly considering the longterm channel conditions and the distribution of user load among different tiers. While the proposed method enhances the celledge UE performance, no results related to mobility performance are presented.
In [
28], the authors propose a MABbased intercell interference coordination approach that aims at maximizing the throughput and handover performance by subband selection for transmission for a smallcellonly network. The proposed approach deals with the tradeoff of increasing the subband size for throughput and handover success rate maximization and decreasing the subband size as far as possible to minimize interference. The only parameter which is optimized is the number of subbands based on some signaltointerferenceplusnoiseratio (SINR) thresholds. While the HOF rate is decreased by the proposed approach, the PP probability is not analyzed. To the best of our knowledge, there is no previous work related to learningbased HetNet MM in the literature, which jointly considers handover performance, load balancing, and fairness.
2.2 Contribution
In this paper, we propose MM approaches for network capacity maximization and HOF reduction, which also maintain user fairness across the network. In Figures
1a,b, we depict the basic idea of the classical MM and the proposed MM approaches, respectively. In the classical MM approach, there is no information exchange among tiers in case of UE handover and traffic offloading might be achieved by picocell range expansion. In the proposed MM approaches, instead, each cell individually optimizes its own MM strategy based on limited coordination among tiers. Hereby, macro and pico BSs learn how to optimize their REB on the long term. On the other hand, on the short term, they carry out user scheduling based on each UE’s velocity and average rate via coordination among the macrocell and picocell tiers. We propose two learningbased MM approaches: MAB and satisfactionbased MM. The major difference between MAB and satisfactionbased learning is that MAB aims at maximizing the overall capacity, while satisfactionbased learning aims at satisfying the network in terms of capacity. The contributions of this article can be summarized as follows:

In the proposed MM approaches, we focus on both shortterm and longterm solutions. In the long term, a traffic load balancing procedure in a HetNet scenario is proposed, while in the short term, the UE association process is solved.×

To implement the longterm load balancing method, we propose two learningbased MM approaches by using reinforcement learning techniques: a MABbased and a satisfactionbased MM approach.

The shortterm UE association process is based on a proposed contextaware scheduler considering a UE’s throughput history and velocity to enable fair scheduling and enhanced cell association.
3 System model
We focus on the downlink transmission of a twolayer HetNet, where layer 1 is modeled as macrocells and layer 2 as picocells. The HetNet consists of a set of BSs
\(\mathcal {K} = \{1,\ldots,K\}\) with a set
\(\mathcal M = \{1,\ldots,M\}\) of macrocells underlaid by a set
\(\mathcal P = \{1,\ldots,P\}\) of picocells, where
\(\mathcal {K} = \mathcal {M} \cup \mathcal {P}\). Macro BSs are dropped following a hexagonal layout including three sectors. Within each macro sector
m,
\(p\in \mathcal {P}\) picocells are randomly positioned, and a set
\(\mathcal U = \{1,\ldots,U\}\) of UE which are randomly dropped within a circle around each picocell
p (referred as a hotspot region). UE associated to macrocells are referred as macro UE
\(\mathcal {U}(m) = \{1(m),\ldots,U(m)\} \in \mathcal {U}\) and UE served by picocells are referred as pico UE
\(\mathcal {U}(p) = \{1(p),\ldots,U(p)\} \in \mathcal {U}\), where
\(\mathcal {U}(p) \neq \mathcal {U}(m)\). Each UE
i(
k) with
k∈{
m,
p} has a randomly selected velocity
\(v_{i(k)}\in \mathcal {V}\) km/h and a random direction of movement within an angle of [ 0;2
π]. This is a slightly modified version of the mobility model described in Section 5.3.1 case 2 in [
2]. The only difference with the 3GPP mobility model is that we do not consider any bouncing circle in our simulations. Hereby, considering a random direction of movement and no bouncing circle enables different types of handover (macrotomacro and pico(macro)tomacro(pico). A cochannel deployment is considered, in which picocells and macrocells operate in a system with a bandwidth
B consisting of
r={1,…,
R} resource blocks (RBs). At every time instant
t
_{ n }=
n
T
_{ s } with
n=[1,…,
N] and
T
_{ s }=1 ms, each BS
k decides how to expand its coverage area by learning its REB
β
_{ k }={
β
_{ m },
β
_{ p }} with
β
_{ m }={0;3;6} dB and
β
_{ p }={0;3;6;9;12;15} dB
^{a}. Both macro and pico BSs select their REB to decide which UE
i(
k) to schedule on which RB based on the UE’s context parameters. These context parameters are defined as the UE’s velocity
v
_{ i(k)}, its instantaneous rate
ϕ
_{ i(k)}(
t
_{ n }) when associated to BS
k, and its average rate
\(\overline {\phi }_{i(k)}(t_{n})\) defined as
$$ \overline{\phi}_{i(k)}(t_{n}) = \frac{1}{T} \sum_{n=1}^{N} \phi_{i(k)}(t_{n}), $$
(1)
whereby
T=
N
T
_{ s } is a time window. The instantaneous rate
ϕ
_{ i(k)}(
t
_{ n }) is given by
$$ \phi_{i(k)}(t_{n}) = B_{i(k)} \log\left(1+\gamma_{i(k)}(t_{n})\right), $$
(2)
with
γ
_{ i(k)}(
t
_{ n }) being the SINR of UE
i(
k) at time
t
_{ n }, which is defined as
$$ \gamma_{i(k)}(t_{n}) = \frac{p_{k} g_{i(k),k}(t_{n})}{\sum\limits_{\substack{j\in\mathcal{K}\\{j}\neq k}}p_{j} g_{i(k),j}(t_{n})+\sigma^{2 }}, $$
(3)
where
p
_{ k } is the transmit power of BS
k,
σ
^{2} is the noise variance, and
g
_{ i(k),k }(
t
_{ n }) is the channel gain from cell
k to UE
i(
k) associated to BS
k. The bandwidth
B
_{ i(k)} in Equation
2 is the bandwidth which is allocated to UE
i(
k) by BS
k at time
t
_{ n }.
3.1 Handover procedure
According to the 3GPP standard, the handover mechanism involves the use of RSRP measurements, the filtering of measured RSRP samples, a handover hysteresis margin, and a timetotrigger (TTT) parameter. Hereby, TTT is a time window which starts after the handover condition is fulfilled; a UE does not transmit its measurement report to its serving cell before the TTT timer expires. This helps to ensure that pingpongs are minimized due to fluctuations in the link qualities from different cells. The main steps of a typical handover process in a HetNet scenario are illustrated in Figure
2. First, a UE performs RSRP measurements and waits until the biased RSRP from a target cell (e.g., a picocell) is larger than the biased RSRP from its serving cell (e.g., a macrocell) plus a hysteresis margin (step 1). Hence, a handover is executed if the target cell’s (biased) RSRP (plus hysteresis margin) is larger than the source cell’s (biased) RSRP, i.e., the handover condition for a UE
i(
k) to BS
k is defined as
$$ P_{l}(i(l)) + \beta_{l} < P_{k}(i(k)) + \beta_{k} + m_{\text{hist}}, $$
(4)
×
with
\(\{l,k\}\in \mathcal {K}\),
m
_{hist} [dB] the UE or cellspecific hysteresis margin,
β
_{ k }(
β
_{ l }) [dB] is the REB of BS
k(
l), and
P
_{ k }(
i(
k))(or
P
_{ l }(
i(
l))) [dBm] is the
i(
k)th (
i(
l)th) UE’s RSRP from BS
k(
l). Even when the condition in (
4) is satisfied, the UE waits for a duration of TTT, before sending a measurement report to its serving cell (step 2). If the condition in step 1 is still satisfied after TTT, the UE sends the measurement reports to its serving cell in its uplink (step 3), which finalizes the handover by sending a handover command to the UE in the downlink (step 4). In order to filter out the effects of fast fading and shadowing, a UE obtains each RSRP sample as the linear average over the power contribution of all resource elements that carry the common reference signal within one subframe (i.e., 1 ms) and in a predefined measurement bandwidth (e.g., 6 RBs) [
2]. This linear averaging is done in layer 1 filter. As illustrated in Figure
3, layer 1 filtering is performed every 40 ms and averaged over five samples. The layer 1 filtered measurement is then updated through a firstorder infinite impulse response filter in layer 3 every 200 ms [
2].
×
4 Problem formulation for throughput maximization
In this section, we describe our optimization approach for maximizing the total rate of each cell
k as a longterm load balancing process. To provide a better overview, we first present the interaction of the proposed longterm and shortterm processes in Figure
4. The longterm load balancing optimization approach is solved by the proposed learningbased MM approaches presented in Section
3 and Section
3. As illustrated in Figure
4, both, MAB and satisfactionbased MM, result in REB
β
_{ k } value optimization and in load balancing
ϕ
_{ k,tot}(
t
_{ n }). Based on the estimated instantaneous load, the contextaware scheduler selects, in the short term, for each RB a UE by considering its history and velocity as described in Section
3. This results in each UE’s instantaneous rate
ϕ
_{ i(k)}(
t
_{ n }) and the RB allocation vector
α
_{ i(k)}(
t
_{ n })=[
α
_{ i(k),1},…,
α
_{ i(k),R }] containing binary variables
α
_{ i(k),r }, and indicating whether UE
i(
k) of BS
k is allocated at RB
r or not. The interrelation between the selected context parameters (UE’s history and velocity), the scheduling function, the described optimization formulation, and the rationale behind the shortterm and longterm MM approaches can be summarized as follows. Within the proposed MM approach, we carry out load balancing in the long term by optimizing the REB values and carry out historybased UE scheduling in the short term by means of the proposed contextaware scheduler. The combination of both procedures yields the HOF and PP probability reduction via the optimal REB value selection and the proposed contextaware scheduler. Here, the load balancing procedure yielding the optimal REB value incurs wideband SINR enhancement and HOF reduction. The contextaware scheduler on the other hand schedules UE based on the highest estimated achievable rate of each UE according to its instantaneous channel condition and its history, which leads to longterm fairness among UE. Both approaches, i.e., load balancing and historybased scheduling, yield throughput enhancement. Additionally, the velocitybased ranking property of the contextaware scheduler reduces the PP probability since low velocity UE are prioritized over highvelocity UE.
×
The optimization approach aims at maximizing the total rate
ϕ
_{ k,tot}(
t
_{ n }) of each cell
k in longterm, i.e., for
t
_{ n }=[
t
_{1}, …,
t
_{ N }], through dynamically changing the RB allocation
α
_{ k }(
t
_{ n }) and REB
β
_{ k }, whereby the total rate is defined as
$$ {\phi}_{k,\text{tot}}(t_{n}) = \sum_{{i}(k)\in \mathcal{U}_{k}} \sum_{r=1}^{R} \alpha_{i(k),r}(t_{n}) \phi_{i(k),r}(t_{n}), $$
(5)
ϕ
_{ i(k),r }(
t
_{ n }) is the instantaneous rate of UE
i(
k) at RB
r. At each time instant
t
_{ n }, each BS
k performs the following optimization:
$$\begin{array}{*{20}l} {}&& \max_{\substack{\boldsymbol{\alpha}_{i(k)}(t_{n})\\\beta_{k}}} \sum_{n=1}^{N} \phi_{k,\text{tot}}(t_{n}) \end{array} $$
(6)
$$\begin{array}{*{20}l}[4pt] {}&=& \max_{\substack{\boldsymbol{\alpha}_{i(k)}(t_{n})\\\beta_{k}}} \sum_{n=1}^{N} \sum_{i(k)\in \mathcal{U}_{k}} \sum_{r=1}^{R} \alpha_{i(k),r}(t_{n}) \phi_{i(k),r}(t_{n}) \end{array} $$
(7)
$${} {\small{\begin{aligned} =\max_{\substack{\boldsymbol{\alpha}_{i(k)}(t_{n})\\[1pt] \beta_{k}}} \sum_{n=1}^{N} \sum_{i(k)\in \mathcal{U}_{k}} B_{i(k)} \sum_{r=1}^{R} \alpha_{i(k),r}(t_{n}) \log\left(1+\gamma_{i(k),r}(t_{n})\right) \end{aligned}}} $$
(8)
$$\begin{array}{*{20}l} &&\text{subject to:} \end{array} $$
(9)
$$\begin{array}{*{20}l}[2pt] &&\alpha_{i(k),r}(t_{n}) \in \left\{0,1\right\} \end{array} $$
(10)
$$\begin{array}{*{20}l}[2pt] &&\sum_{i(k)\in \mathcal{U}_{k}} \alpha_{i(k),r} = 1 \forall r, \forall k, \end{array} $$
(11)
$$\begin{array}{*{20}l} &&p_{k} \leq p_{k,\text{max}} \end{array} $$
(12)
$$\begin{array}{*{20}l}[2pt] &&\phi_{i(k)}(t_{n}) \geq \phi_{k,\text{min}} \end{array} $$
(13)
The condition in (
12) implies that the total transmitted power over all RBs does not exceed the maximum transmission power of BS
k. Since our optimization approach in (
6) aims at maximizing the total rate, the last condition (
13) dictates that the instantaneous rate is larger than a minimum rate for BS
k. Due to the distributed nature of this optimization problem in (
6), we will investigate two reinforcement learning techniques in this paper, as will be discussed in Sections
3 and
3.
5 Shortterm solution: a contextaware scheduler
The proposed MM approach considers a fairnessbased contextaware scheduling mechanism in the short term. At each RB
r, a UE
i(
k)
^{∗} is selected to be served by BS
k according to the following scheduling criterion:
$$\begin{array}{@{}rcl@{}} i(k)_{r}^{*} = \begin{array}{c} \text{sort} \\ \min{\left(v_{i(k)}\right)} \end{array}\left(\arg\max_{i(k)\in \mathcal{U}_{k}} \frac{\phi_{i(k),r}(t_{n})}{\overline{\phi}_{i}(t_{n})}\right), \end{array} $$
(14)
where
\(\text {sort}_{\min (v_{i(k)})}\) sorts the candidate UE according to their velocity starting with the slowest UE. After the sorting operation, if more than one UE can be selected for RB
r, the UE with minimum velocity is selected. The rationale behind introducing a sorting/ranking function for candidate UE according to their velocity is that highvelocity UE will not be favored over slow moving UE. This has two advantages: 1) Highvelocity UE might pass through the picocell quickly and should therefore not be favored to avoid PPs, and 2) the channel conditions of lowvelocity UE changes slowly which may result, especially for slowmoving celledge UE, in poor rates if they are not allocated to many RBs.
The scheduler defined in (
14) will allocate many (or even all) resources to a newly handed over UE since its average rate in the target cell is zero. To avoid this and enable a fair resource allocation among all UE in a cell, we propose a historybased scheduling approached as follows. Via the X2interface, macro and picocells coordinate, so that once a macro UE
i(
m) is handed over to a picocell
p, the UE’s target cell
p and source cell
m exchange information. In particular, UE
i(
m)’s rate history at time instant
t
_{ n } is provided to picocell
p in terms of average rate
\(\overline {\phi }_{i(m)}(t_{n})\), such that the UE’s (which is named as
i(
p) after the handover) average rate at picocell
p becomes
$$\begin{array}{@{}rcl@{}} \overline{\phi}_{i(p)}\left(t_{n}+T_{s}\right) = \frac{T\overline{\phi}_{i(m)}(t_{n}) + \phi_{i(p)}\left(t_{n}+T_{s}\right)}{T+1}. \end{array} $$
(15)
In (
15), a moving average rate is considered from macrocell to picocell, whereas in the classical MM approach, a UE’s rate history is not considered and is equal to zero. In other words, in the classical proportional fair scheduler, the average rate
\(\overline {\phi }_{i}(t_{n})\) in (
14) is
\(\overline {\phi }_{i}(t_{n}) = \overline {\phi }_{i(k)}(t_{n}) = 0\) when a UE is handed over to cell
k, whereas we redefine it according to (
15), i.e.,
\(\overline {\phi }_{i}(t_{n}) = \overline {\phi }_{i(p)}(t_{n}+T_{s})\). The proposed MM approach, instead, considers the historical rate when UE
i(
m) was associated to the macrocell
m in the past. The incorporation of a UE’s history enables the scheduler to perform fair resource allocation even in the presence of a sequence of handovers. Since handovers occur more frequently in HetNets due to small cell sizes, such a historybased scheduler leads to fair frequency resource allocation among the UE of a cell. More specifically, UE recently joining a cell will not be preferred over other UE of the cell since their historical average rate will be taken into account.
6 Longterm solution: learningbased mobility management techniques
To solve the optimization approach defined in Section
3, we rely on the self organizing capabilities of HetNets and propose an autonomous solution for load balancing by using tools from reinforcement learning [
29]. Hereby, each cell develops its own MM strategy to perform optimal load balancing based on the proposed learning approaches presented in Sections
3 and
3 and schedules its UE according to the contextaware scheduler presented in Section
3. An overview of the interrelation of the shortterm and longterm MM approaches is provided in Algorithm 2 (see
Appendix). To realize this, we consider the game
\(\mathcal {G}=\{\mathcal {K},\{\mathcal {A}_{k}\}_{k\in \mathcal {K}},\{u_{k}\}_{k\in \mathcal {K}}\}\), in which each cell autonomously learns its optimal MM strategy. Hereby, the set
\(\mathcal {K}=\{\mathcal {M}\cup \mathcal {P}\}\) represents the set of players (i.e., BSs), and for all
\(k\in \mathcal {K}\), the set
\(\mathcal {A}_{k} = \{\beta _{k}\}\) represents the set of actions player
k can adopt. For all
\(k\in \mathcal {K}\),
u
_{ k } is the utility function of player
k. The action definition implies that the BSs learn at each time instant
t
_{ n } to optimize their traffic load in the long term using the cell range expansion process. Each BS learns its action selection strategy based on the MAB
or satisfactionbased learning approaches as presented in Sections
3 and
3, respectively.
6.1 Multiarmed banditbased learning approach
The first learningbased MM approach is based on the MAB procedure, which aims to maximize the overall system performance. MAB is a machine learning technique based on an analogy with the traditional slot machine (onearmed bandit) [
30]. When pulled at time
t
_{ n }, each machine/player provides a reward. The objective is to maximize the collected reward through iterative pulls, i.e., learning iterations. The crucial tradeoff the player faces at each trial is between exploitation of the action that has the highest expected utility and exploration of new actions to get more information about the expected utility performance of the other actions. The player selects its actions based on a decision function reflecting this explorationexploitation tradeoff.
The set of actions, the learning strategy, and the utility function for our MABbased MM approach are defined as follows:

Actions: \(\mathcal {A}_{k}=\{\beta _{k}\}\) with β _{ m }=[0,3,6] dB and β _{ p }=[0,3,6,9,12,15,18] dB is the CRE bias. We consider higher bias values for picocells due to their low transmit power. The considered bias values are selected considering the results in [ 31] and also based on our extensive simulation studies.

Strategy:1.Every BS k learns its optimum CRE bias value on a longterm basis considering its load as defined in ( 5). This is interrelated with the handover triggering by defining the cell border of each cell. The problem formulation defined in Section 3 optimizes this load in the long term, i.e., over time t _{ n }.

Utility Function: The utility function is a decision function in MAB learning and is composed by an exploitation term represented by player k’s mean reward for selecting an action until time t _{ n } and an exploration term that considers the number of times an action has been selected so far. Player k selects its action \(a_{j(k)}(t_{n})\in \mathcal {A}_{k}\) at time t _{ n } through maximizing a decision function \(d_{k,a_{j(k)}}(t_{n})\), which is defined as$$ {\footnotesize{\begin{aligned} d_{k,a_{j(k)}}(t_{n}) = u_{k, a_{j(k)}}(t_{n}) + \sqrt{\frac{2\log\left(\sum_{i=1}^{\mathcal{A}_{k}} n_{k,a_{i(k)}}(t_{n})\right)}{n_{k,a_{j(k)}}(t_{n})}}, \end{aligned}}} $$(16)where \(u_{k, a_{j(k)}}(t_{n})\) is the mean reward of player k at time t _{ n } for action a _{ j(k)}, \(n_{k,a_{j(k)}}(t_{n})\) is the number of times action a _{ j(k)} has been selected by player k until time t _{ n }, and · represents the cardinality operation.
The decision function in the form as in (
16) has been proposed by Auer et al. in [
30]. The main advantage of Auer’s solution is that the decision function does not rely on the regret which is the loss due to the fact that the globally optimal policy (which is usually not known in practical wireless networks) is not followed in each learning iteration. It is clear that the regret will increase with time, because the globally optimal policy is usually not known and hence cannot be followed by the player. Thus, the MABbased strategies need to bound the increase of the regret, at least asymptotically. Auer et al. defined an upper confidence bound (UCB) within which the regret will be present. The UCB considers the player’s average reward and the number of times an action has been selected until
t
_{ n }. Relying on these assumptions, we define our decision function in Equation
16.
During the first
\(t_{n}=\mathcal {A}_{k}\) iterations, player
k selects each action once in a random order to initialize the learning process by receiving a reward for each action. For the following iterations
\(t_{n}>\mathcal {A}_{k}\), action selection is performed according to Algorithm 1. In each learning iteration, the action
\(a^{*}_{j(k)}\) that maximizes the decision function in (
16) is selected by player
k. Then, the parameters
\(s_{k,a_{j(k)}}(t_{n})\),
\(n_{k,a_{j(k)}}(t_{n})\), and
\(u_{k,a_{j(k)}}(t_{n})\) are updated, where
\(s_{k,a_{j(k)}}(t_{n})\) is the cumulated reward of player
k after playing action
a
_{ j(k)} and
is equal to 1 if
i=
j and zero otherwise.
6.2 Satisfactionbased learning approach
The idea of satisfaction equilibrium was introduced in [
32], where players having partial or no knowledge about their environment and other players are solely interested in the satisfaction of some individual performance constraints instead of individual performance optimization. Here, we consider the player to be satisfied if its cell reaches a certain minimum level of total rate and if at least 90% of the UE in the cell obtain a certain average rate. The rationale behind considering these satisfaction conditions is to guarantee a minimum rate for each individual UE, while at the same time improving the total rate of the cell.
To enable a fair comparison, the set of players and the corresponding set of actions in the proposed satisfactionbased MM approach are considered to be the same as those in the MABbased MM approach. The utility function of player
k at time
t
_{ n } is defined as the load according to (
5). In the satisfactionbased learning approach, the actions are selected according to a probability distribution
\(\boldsymbol {\pi }_{k}(t_{n})=\left [\pi _{k,1}(t_{n}),\ldots,\pi _{k,\mathcal {A}_{k}}(t_{n})\right ]\). Hereby,
π
_{ k,j }(
t
_{ n }) is the probability with which BS
k chooses its action
a
_{ j(k)}(
t
_{ n }) at time
t
_{ n }. The following learning steps are performed in each learning iteration:
1.
In the first learning iteration
t
_{ n }=1, the probability of each action is equal and an action is selected randomly.
2.
In the following learning iterations
t
_{ n }>1, the player changes its action selection strategy
only if the received utility does not satisfy the cell, i.e., if the satisfaction condition is not fulfilled.
3.
If the satisfaction condition is not fulfilled, the player
k selects its action
a
_{ j(k)}(
t
_{ n }) according to the probability distribution
π
_{ k }(
t
_{ n }).
4.
Each player
k receives a reward
ϕ
_{ k,tot}(
t
_{ n }) based on the selected actions.
5.
The probability
π
_{ k,j }(
t
_{ n }) of action
a
_{ j(k)}(
t
_{ n }) is updated according to the linear rewardinaction scheme:
(17)
whereby
for the selected action and zero for the nonselected actions. Moreover,
b
_{ k }(
t
_{ n }) is defined as follows:
$$b_{k}(t_{n})=\frac{u_{k,\text{max}} + \phi_{k,\text{tot}}(t_{n}) u_{k,\text{min}}}{2 u_{k,\text{max}}}, $$
(18)
where
u
_{ k,max} is the maximum rate in case of singleUE and
\({u_{k,\text {min}}= \frac {1}{2} u_{k,\text {max}}}\). Hereby,
\(\lambda = \frac {1}{100 t_{n}+T_{s}}\) is the learning rate.
7 Simulation results
The scenario used in the systemlevel simulations is based on configuration #4b HetNet scenario in [
21]. We consider a macrocell consisting of three sectors and
P={1,2,3} pico BSs per macro sector, randomly distributed within the macrocellular environment as illustrated in Figure
5. It has to be pointed out that the proposed MM approaches can be applied to any number of macrocells. We are presenting in this section the results for one macrocell (three macro sectors) due to large computation times. In each macro sector,
U=30 mobile UE are randomly dropped within a 60m radius of each pico BS. The rationale behind dropping all UE around pico BSs is to obtain a large number of handover within a short time in order to avoid large computation times due to the complexity of our system level simulations. Each UE
i(
k) has a randomly selected velocity
v
_{ i(k)} of
\({\mathcal {V}= \{3;30; 60; 120\}}\) km/h and a random direction of movement within [ 0;2
π]. We consider fastfading and shadowing effects in our simulations that are based on 3GPP assumptions [
21]. Further simulation parameters are summarized in Table
1.
Table 1
Simulation parameters
Parameter

Value


Cellular layout

Hexagonal grid,

Three sectors per cell, reuse 1


Carrier frequency

2 GHz

System bandwidth

10 MHz

Subframe duration

1 ms

Number of RBs

50

Number of macrocells

1

Number of PBSs per macrocell
P

{1,2,3}

Max. macro (pico) BS transmit power

\(P_{\text {max}}^{M} = 46\) dBm

(
\(P_{\text {max}}^{P} = 30\) dBm)


Macro path loss model

128.1+37.6 log10(
R) dB (
R[km])

Pico path loss model

140.7+36.7 log10(
R) dB (
R[km])

Traffic model

Full buffer

Scheduling algorithm

Proportional fair

Transmission mode

Transmit diversity

Min. dist. MBSPBS

75 m

Min. dist. PBSPBS

40 m

Min. dist. MBSMUE

35 m

Min. dist. PBSPUE

10 m

Hotspot radius

60 m

Thermal noise density

−174 dBm

Macro BS antenna gain

14 dBi

Pico BS antenna gain

5 dBi

×
To compare our results with other approaches, we consider a baseline MM approach as defined in [
2]. In the baseline MM approach, handover is performed based on layer 1 and layer 3 filtering as described in Section
3. For the baseline MM approach, we consider proportional fairbased scheduling, with no information exchange between macro and pico BSs. This baseline approach is referred to as
classical HO approach. In the following, we will compare our proposed MM approaches with this baseline MM approach which is aligned with the handover procedure defined in 3GPP [
2].
7.1 UE throughput and sumrate
Figure
6 depicts the cumulative distribution function (CDF) of the UE throughput for the
classical,
MAB and
satisfactionbased MM approaches. For the classical approach, we present results for different picocell REB values
β
_{ p }. For the
MAB and
satisfaction based MM approaches, we distinguish between the longtermonly MM approach (with a proportional fair scheduler instead of the proposed contextaware scheduler) and the longterm and shortterm MM approach in dashed and solid lines, respectively, to demonstrate the impact of the proposed contextaware scheduler. In the selected scenario, the CRE of picocells does influence the celledge (5th%) UE throughput, which is zoomed in Figure
6. This is because in the simulation scenario all UE are dropped within a radius of 60 m around the picocell, so that many macrocell UE are close to the picocell and suffer from intercell interference. Compared to the classical approach, MAB and satisfactionbased approaches lead to an improvement of up to 39% and 80% for celledge UE for the longterm and shortterm approaches, respectively. In the case of the longtermonly MM approach, the MABbased approach yields similar results as the classical approach for celledge UE. A similar gain is achieved in terms of average (50th%) UE throughput. Here, the MAB and satisfactionbased approaches yield 43% and 75% improvement compared to the classical approach. In the case of the longterm only MM approaches, the improvement in average UE throughput is 12% and 37% for the MAB and satisfactionbased MM approaches, respectively. Hence, the satisfactionbased approach outperforms the other MM approaches in terms of average and celledge UE throughput. In case of the cellcenter UE throughput, which is defined as the 95th% throughput, the opposite behavior is obtained. In this case, an improvement of 124% and 80% is achieved for the MAB and satisfactionbased approaches, respectively. The reason is that the satisfactionbased MM approach only aims at satisfying the network in terms of rate and does not update its learning strategy once satisfaction is achieved in the network, i.e., no actions that may lead to improvements are explored. The MABbased approach on the other hand aims at maximizing the network performance, which is reflected in the improved cellcenter UE throughput.
×
To show the effect of different REB values in case of classical MM, we depict in Figure
7 the celledge UE throughput for different number of UE per macrocell. With increasing REB bias value for picocells, the celledge UE throughput is enhanced. However, the proposed learningbased MM approaches outperform the classical approach up to five times for 10 UE per macrocell and up to 4.5 times for 50 UE per macrocell. Interestingly, the MABbased MM yields higher celledge UE throughput than the satisfactionbased MM approach for smaller number of UE. This can be interpreted as follows: In a scenario with 10 UE and a bandwidth of 10 MHz, the probability of having unsatisfied UE is very low. Since most of the UE (or even all UE) are satisfied, the satisfactionbased learning approach does not change its strategy to further enhance each UE’s performance, whereas the MABbased learning procedure adapts its strategy disregarding the satisfaction condition to maximize the UE’s performance. This leads to the fact that the celledge UE performance is enhanced, too. With increasing number of UE, the probability of unsatisfied UE’s in the cell increases and the satisfactionbased learning approach adapts its strategy to enhance especially the celledge UE performance, i.e., to satisfy all UE in the cell. That is why the celledge UE performance is larger for the satisfactionbased MM approach for increasing the number of UE. In addition to the results for different REB values, we present in Figure
8 the sumrate versus UE density per macrocell for TTT = {40,480} ms values. For both TTT values, the classical approach yields very low sumrates, while the proposed approaches lead to significant improvement of up to 81% and 85% for TTT values 40 ms and 480 ms, respectively. The proposed MM approaches converge to significantly larger sumrates than the classical MM approach. Interestingly, the sumrate performance of the proposed MM approaches depends on the TTT. The reason for this lies in the convergence behavior of the learning algorithms. For smaller TTT values, handover is executed faster and the BS has to adapt its REB strategy to the new cell load before convergence. To demonstrate the statistical properties of the presented results in Figure
8, we present confidence intervals of the sumrates for a confidence level of 95% in Table
2. It can be seen that the presented sumrate results lie within the confidence intervals.
Table 2
Confidence intervals of the sumrates for a confidence level of 95%
MM approach

Number of UE per macrocell



10

20

30

50


Classical MM

[45.74, 56.33]

[58.77, 67.15]

[62.61, 71.94]

[67.51, 76.81]

MAB MM

[92.04, 113.92]

[116.79, 132.7]

[124.4, 142.33]

[124.4, 142.33]

Satisfaction MM

[92.37, 114.11]

[116.07, 132.84]

[124.04, 142.2]

[128.64, 145.77]

×
×
Figure
9 depicts the effect of the longterm strategy on the derived REB values. Since the long term is related to the load estimation which impacts the average UE throughput, we depict it over the REB values. In case of the learningbased MM approaches, we would like to emphasize that each BS selects its own REB value according to the selected learning approach. This leads in the simulations to the fact that in the REB optimization process some REB values are favored against other REB values and that a mix of REB value selection for all BSs is achieved, so that the average UE throughput is depicted over mixed REB values. This is due to the fact that based on the selforganizing capability of our proposed learning approaches, the REB values are selected autonomously by each BS according to their learning strategy. Hence, the selection of each REB value is controlled by the learning approach and not by our simulation settings, so that it is not possible to separately present results for each REB value. It can be seen that a fix REB value selection leads to lower average UE throughput than an autonomous REB value selection by each BS, which is the case for the proposed learningbased MM approaches.
×
To compare the performance of the proposed approaches for different number of picocells per macrocell, Figure
10 plots the sumrate versus number of pico BSs per macrocell. For the different number of pico BSs, the proposed MM approaches yield gains of around 70% to 80% for TTT = 480 ms.
×
7.2 HOF and PP probabilities
Mobility management approaches must not only enhance the networks performance in terms of UE throughput and sumrate but also reduce handover failure rates and PP probabilities. Here, we present results for the HOF probability and PP event probability. We modify our simulation settings by using the same velocity for each UE per simulation, so that we can present results for each velocity separately. Figure
11 depicts the HOF probability for different TTT values. As it can be seen, our proposed learningbased approaches yield, besides the gains in terms of rate, improvements in terms of HOF probability. Compared to the classical MM approach, the proposed methods yield to the same HOF probability for UE at 3 km/h speed. For higher velocities in which more HOFs are expected, the HOF probability obtained by the proposed approaches is significantly lower than in the case of classical MM. Interestingly, the proposed methods lead to almost constant HOF probabilities for velocities larger than 30 km/h. For UE at 120 km/h, the HOF probability of the MABbased approach is twice the HOF probability of satisfactionbased learning but only one third of the classical approach for TTT = 40 ms. For increasing TTT values, the trend between the proposed MM approaches and the classical MM approach remains similar. This is because the picocell coverage in the classical approach without range expansion is small, and thus the macrocell UE quickly run deep inside the picocell coverage area before the TTT expires, significantly degrading the signal quality of the macrocell UE before the handover is completed. In this case, HOFs are alleviated with smaller TTT values.
×
Reducing TTT values may decrease the HOF probability but increase PP probability. Hence, HOFs and PPs must be studied jointly. We depict in Figure
12 the PP probability for the same simulation settings as in Figure
11. It can be observed that the number of PPs are reduced with larger TTT values. In addition, for lower velocities, all MM approaches yield similar PP probabilities for all TTT values. For higher velocities, the PP probability is decreased by the proposed MM approaches by up to a factor of two (TTT = 40 ms).
×
7.3 Convergence behavior of MAB and satisfactionbased MM
One of the major concerns of learningbased approaches is their convergence behavior in dynamic systems. In Figure
13, we depict the convergence of the proposed learningbased MM approaches in terms of cellcenter UE throughput. It can be seen that the MABbased MM approach converges slower than the satisfactionbased MM approach, but it converges to a larger cellcenter UE throughput since it aims at system performance maximization. Hence, depending on the system requirements, either the MABbased approach can be applied if a large cellcenter UE throughput is aimed or the satisfactionbased approach can be preferred if a better celledge UE throughput (see Figure
7) and fast convergence is expected.
×
8 Conclusions
We propose two learningbased MM approaches and a historybased contextaware scheduling method for HetNets. The first approach is based on MABbased learning and aims at system performance maximization. The second method aims at satisfying each cell and each UE of a cell and is based on satisfactionbased learning. System level simulations demonstrate the performance enhancement of the proposed approaches compared to classical MM method. While up to 80% gains are achieved in average for UE throughput, the HOF probability is reduced significantly by the proposed learningbased MM approaches.
9 Endnote
^{a} We consider lower REB values for macro BSs to avoid overloaded macrocells due to their large transmission power.
10 Appendix
Algorithm 2 presents the interrelation between the shortterm and longterm MM approaches.
Acknowledgements
This research was supported in part by the SHARING project under the Finland grant 128010 and by the U.S. National Science Foundation under the grants CNS1406968 and AST1443999.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The general ideas presented in this article have been developed by all the authors at high level. MS developed the algorithms presented, implemented them into the system level simulator, and ran simulations. The paper has been revised collaboratively by all authors. All authors read and approved the final manuscript.