1 Introduction
-
A method to combine data gathered from LTE with data from the co-existing uRAT has been proposed. In particular, the mobile traces are used to analyze the performance of the users when they leave LTE technology to continue their session in the co-existing uRAT. Through this method, an inter-technology track of the user is performed providing its inter-technology event flow.
-
Another key contribution is an inter-technology metric that estimates the active time that the users are on LTE, so it is calculated at user level based on the proposed inter-technology event flow and then aggregated at cell level to determine the overall impact.
-
The proposed metric has been used to design the detection and diagnosis phases of a self-healing system. The main benefit is its ability to identify coverage holes and classify them depending on their impact on LTE and the co-existing uRAT. As a result, experts can design their particular remedial action based on the specific impact, making them case specific.
2 Problem formulation
-
Call drop (Fig. 1): It happens when a connection is unexpected released before the service requested by the user can be completed. As a result, the call drop occurs when the service is in progress so the user’s packets are not scheduled either because the lack of available resources or because the connection quality in terms of SINR is below a threshold. This situation is the one that has the worst impact on the user because they entirely lose their connection resulting in customer dissatisfaction. Cells affected by LTE coverage holes without uRAT are characterized by a high call drop rate.×
-
Radio link failure (RLF): The connection is momentarily lost due to the bad quality of the air interface during a specific time interval. Unlike call drop, during a RLF, the connection is not released despite the low level of SNIR but is saved by either the serving cell or any LTE neighbor throughout the reestablishment procedure [20]. As a consequence, the user experiences service and audio gaps since the RLF occurs until the connection is successfully re-established in any LTE cell. In this scenario, the LTE network is capable of autonomously recover the connection, maintaining the service in LTE.
-
Handover to legacy system (Fig. 1): In this scenario, the LTE connections suffering bad quality are transferred to a neighbor of the legacy system (e.g. 2G, 3G) through the inter-RAT handover (iRAT HO) procedure. In particular, an iRAT HO may be triggered by B2 event [20], that is, when the LTE serving cell becomes worse than threshold 1 (Th1_B2) and inter-RAT neighbor becomes better than threshold 2 (Th2_B2). In particular, the B2 event for an iRAT HO is formally expressed by the following conditions:Entering condition 1:$$ {M_{s}}+\text{Hyst}<\text{Th}_{1} $$(1)Entering condition 2:$$ {M_{n}}+O_{\text{fn}}-\text{Hyst}>\text{Th}_{2} $$(2)Leaving condition 1:$$ {M_{s}}-\text{Hyst}>\text{Th}_{1} $$(3)Leaving condition 2:$$ {M_{n}}+O_{\text{fn}}+\text{Hyst}<\text{Th}_{2} $$(4)where M s is the measurement result of the serving cell s, it can be either the Reference Signal Received Power (RSRP) or the Reference Signal Recieved Quality (RSRQ). M n is the measurement result of the inter-RAT neighbor cell (e.g., the Received Signal Code Power (RSCP) in case 3G neighbor). Hyst is the hysteresis parameter for B2 event. O fn is the frequency specific offset of the frequency of the inter-RAT neighboring cell. Th1 and Th2 correspond to the threshold parameter for this event for serving cell and target cell respectively.An example of an iRAT HO from LTE to 3G is presented in Fig.1. When a user fulfills the entering conditions 1 and 2 during a specific time interval, configured through the Time To Trigger (TTT) parameter, the iRAT HO procedure is launched by the eNodeB in order to transfer the connection to the target Radio Network Controller (RNC) and its nodeB (NB) in 3G. To that end, the Mobility Management Entity (MME) and the Serving Gateway (S-GW) along with the Serving Gateway Support Node (SGSN) are in charge of executing the iRAT HO. As a result of iRAT HOs, the requested services are maintained by the legacy systems avoiding unexpected user disconnection. However, this has a negative impact on the user experience, since the service performance is reduced, (e.g., reducing the speed or increasing the latency). Cell affected by coverage holes with uRAT will have high number or iRAT HOs.
3 System model
3.1 Framework
-
Data collection: The cells are monitored by means of different metrics, such as configuration management (CM) parameters, performance management (PM) parameters, performance indicators (PI), and mobile traces. The first of these, CM, represents the current configuration of network elements (e.g., the maximum transmit power). PM counts the number of times a specific event o procedure has taken placed (e.g., the number of dropped calls). Regarding the PI, these metrics are calculated through the combination of several PM, obtaining statistical measurements at cell level (e.g., the call drop ratio). Finally, the mobile traces consist of the measurement and information reported by the UE along with the signaling messages interchanged between the network elements including the user equipment. In a network, the Operations Support System (OSS) is in charge of collecting all those metrics from the network elements (Fig.1).×
-
Threshold estimation: To be able to identify whether or not an indicator of a cell is degraded, the normal performance of the cell should be characterized to determine the reference conditions. Then, for each metric a threshold is defined, so that the indicator is considered degraded if it is over that threshold. There are different methods to automatically design these thresholds from the historical dataset created from the metrics and indicators provided by the OSS of both LTE network and its uRAT (e.g., 3G) during a period of time. In particular, those historical datasets are composed of the specific values of each indicator for each cell, but without including any label or information about the status of the cell or the degree of deterioration. As a result, the thresholds need to be estimated through unsupervised methods since the data is unlabeled. In this paper, for simplicity, the percentile-based discretization (PBD) method [21] will be used hereafter. This unsupervised method is based on the assumption that in a mature network only a low percentage, X %, of the data has anomalous values. Then, for each indicator, the thresholds are fixed at the Xth percentile of the values in the dataset. Note that these thresholds are estimated from the real values gathered from the network (Fig.2), which allows operators to particularize the method for each network, for each cell and even for different period of time (week day/weekend, busy hour,...).
-
Detection and diagnosis system: The LTE cells are analyzed by the detection system to identify those cells with insufficient coverage. Then, the selected cells are deeply analyzed in order to classify the coverage hole and determine the degree of severity (Table 1).Table 1Detection and diagnosis rulesCH typeDetectionDiagnosisBCRIRAT HOHOSRRetATOLLTE CH without uRAT>ThrBCR<ThrIRAT<ThrHOSR<ThrRet–Severe LTE CH with uRAT>ThrBCR>ThrIRAT>ThrHOSR>ThrRet\(< \text {Thr}_{\text {ATOL}_{L}}\phantom {\dot {i}\!}\)Optimized LTE CH with uRAT>ThrBCR>ThrIRAT>ThrHOSR>ThrRet\(> \text {Thr}_{\text {ATOL}_{H}}\phantom {\dot {i}\!}\)
3.2 System indicators
3.2.1 Indicators based on cell-level information
-
E-RAB Retainability (Ret) [22]: it represents the ability of the network to provide a service without causing abnormal disconnections, that is, when there is an impact on the end-user. It is calculated as the percentage of normally terminated connections over the total connections. Note that E-RAB Retainability gives a first indication for areas with lack of LTE coverage that are not covered by any uRAT.
-
Number of bad coverage reports (BCR): when a user starts experiencing poor RF conditions in LTE, it sends an event-triggered measurement report (i.e., A2 event) to its serving cell indicating that the coverage (i.e. RSRP) is below the threshold (Th_A2). This PI counts the amount of RSRP reports that fulfill the A2 event, so the worse the RF condition, the higher the BCR.
-
Handover success rate (HOSR): it shows the percentage of handover successfully executed.
-
Inter-RAT HO rate (IRAT HO): it indicates the percentage of the normal disconnections in LTE that have been redirected to any underlying RAT. This indicator is extremely important to identify those LTE coverage holes with uRAT.
3.2.2 Indicator based on user-level information
-
Firstly, generation of the inter-technology event flow: those connections that the LTE network redirects to another RAT are tracked, generating their inter-technology event flow from the information reported in the mobile traces. Figure 4 shows a flow diagram of how the inter-technology event flow can be obtained. Essentially, this consists of constructing a chronological event flow that temporally organizes all the events that belong to the same UE connection identified through their IMEI, considering the information of both the serving and the target network. After that, all events associated to each flow are temporarily ordered, so that the beginning and the end of each connection can be determined. In particular, the start time (Tstart) is considered as the time in which the first event of a connection is received and, similarly, the end time (Tend) corresponds to the instant of the last event. It should be noted that only those event flows of the serving network whose termination reason indicates that they have performed a handover to an underlying RAT (3G or 2G) are considered. This guarantees that the analysis is focused on connections that have changed to other technology. The next step (Join event flows with the same IMEI in Fig. 4) consists of matching the event flow of each connection in the serving network with their corresponding event flow in the target network. Once the inter-technology event flow is obtained (as represented in Fig. 5), each part of the event flow can be determined. First, the connection of the user is set up and configured through the Radio Resource Control (RRC) protocol [20]. Second, during the LTE connection, the eNodeB releases the user’s connection redirecting it to the 3G network through the IRAT HO procedure, if the user reports good levels of 3G signal while the LTE measurements are degraded. Finally, the connection of the user is setup in the 3G network where the user’s data is sent/received until the connection is released. From this event flow, the start time (Tstart) can be defined as the time in which the first event of a connection is received and, similarly, the end time (Tend) corresponds to the instant of the last event.××
-
Secondly, calculation of the user-level ATOL: it is defined as the percentage of time that a user is on LTE compared to the total duration of its connection (taking into account the duration both in LTE and in the underlying RAT (uRAT)). Formally, this indicator can be calculated by the following equation:$$ \text{ATOL}_{\text{user}}=\frac{\text{duration}_{\text{LTE}}}{\text{duration}_{\text{total}}} $$(5)where durationtotal is the total duration of the inter-technology event flow and durationLTE represents the duration of the connection in the LTE network, i.e., the time interval between the beginning of the connection in LTE (Tstart_LTE) and the time of IRAT (see Fig. 5). In particular, the time of IRAT (T IRAT) is estimated as the middle point between the beginning of the uRAT event flow (Tstart_3G) and the end of the LTE event flow (Tend_LTE), taking into account that the sub-event flows may be overlapped due to the previous signaling.
-
Lastly, calculation of the high-level ATOL: the individual ATOL indicators are aggregated in order to obtain the ATOL indicator at cell level by means of the following average:$$ \text{ATOL}_{\text{average}}=\frac{\sum_{i=1}^{\text{NumTrackedIRATs}}\text{duration}_{\text{LTE}_{i}}}{\sum_{i=1}^{\text{NumTrackedIRATs}}\text{duration}_{\text{total}_{i}}} $$(6)where NumTrackedIRATs represents the total number of IRATs that have been tracked in the analyzed cell, \(\text {duration}_{\text {LTE}_{i}}\phantom {\dot {i}\!}\) represents the duration of the connection i in the LTE network and \(\text {duration}_{\text {total}_{i}}\phantom {\dot {i}\!}\) represents the total duration of the connection i.The benefit of the proposed metric is that it is focused exclusively on those users that actually perform the IRAT HO to uRAT, so the conclusions obtained from the ATOL average metric provide specific information about those particular users that were affected by the coverage hole and so were redirected to the underlying RAT.
4 Detection and diagnosis systems
4.1 Detection of coverage hole
Threshold | Value |
---|---|
ThrBCR
| 80 |
ThrIRAT
| 1.93 % |
ThrHOSR
| 95.45 % |
ThrRet
| 99.5 % |
\( \text {Thr}_{\text {ATOL}_{L}}\phantom {\dot {i}\!}\)
| 20 % |
\( \text {Thr}_{\text {ATOL}_{H}}\phantom {\dot {i}\!}\)
| 80 % |