Disclaimer: This document was prepared by the Congressional Research Service (CRS). CRS serves as nonpartisan shared staff to congressional committees and Members of Congress. It operates solely at the behest of and under the direction of Congress. Information in a CRS Report should not be relied upon for purposes other than public understanding of information that has been provided by CRS to Members of Congress in connection with CRS’s institutional role. CRS Reports, as a work of the United States Government, are not subject to copyright protection in the United States. Any CRS Report may be reproduced and distributed in its entirety without permission from CRS. However, as a CRS Report may include copyrighted images or material from a third party, you may need to obtain the permission of the copyright holder if you wish to copy or otherwise use copyrighted material.
Abstract This paper tests two hypotheses regarding how well two distinct Long-Term Evolution (LTE) network problems can be detected through supervised techniques with near-real-time performance. The tested network problems are physical-cell-identity (PCI) conflicts and root-sequence-index (RSI) collisions. These were labeled through confi gured cell relations that verified these two confl icts. Furthermore, a real LTE network was used. The results obtained showed that both problems were best detected by using each key performance indicator (KPI) measurement as an individual feature. The highest average precisions obtained for PCI conflict detection were 31% and 26% for the 800 MHz and 1800 MHz frequency bands, respectively. The highest average precisions obtained for RSI collision detection were 61% and 60% for the 800 MHz and 1800 MHz frequency bands, respectively. 1. Introduction Two of the major concerns of mobile network operators (MNO) are to optimize and to maintain network performance. However, maintaining performance has proven to be a challenge mainly for large and complex networks. In the long term, changes made in the networks may increase the number of conflicts and inconsistencies that occur in them. These changes include changing the tilting of antennas, changing the cell’s power, or even changes that cannot be controlled by the mobile network operators, such as user mobility and radio-channel fading. In order to assess the network’s performance, quantifiable performance metrics, known as key performance indicators (KPI), are typically used. Key performance indicators can report network performance such as the handover success rate and the channel interference averages of each cell, and are periodically calculated, resulting in time series. A time series can be either univariate or multivariate. As this study uses data samples that represent LTE cells with several measured key performance indicators, then the data consist of multivariate time series. This paper focuses on applying supervised techniques for detecting two known LTE network conflicts, namely physical-cell identity (PCI) conflicts and root-sequence index (RSI) collisions. The labeling used was only possible due to a CELFINET product that allows obtaining cell relations that label the two mentioned network conflicts; also, real data obtained from a LTE network was used. The aim of this paper is to test two hypotheses regarding how well two distinct LTE network problems can be detected through supervised techniques with near-real-time performance. The resulting conflict-detection solution would then run in an entity external from the LTE architecture during the early morning. The solution would then alert the network engineers of any existing conflicts in order to have prompt responses. As this paper aims to create models for near-real- time detection of PCI conflicts and RSI collisions, the popular k-nearest neighbors with dynamic-time-warping classifi cation approach was not tested [1]. The reason for this decision was based on the fact that it is computationally intensive and very slow for large data sets, as was the case for this paper. In order to automatically detect the network fault causes, some work has been done by using key performance indicator measurements with unsupervised techniques, as in[2].
12 The Radio Science Bulletin No 364 (March 2018) The paper is organized as follows. Section 2 introduces the analyzed network problems, namely PCI conflicts and RSI collisions. Section 3 presents the chosen key performance indicators and machine-learning (ML) models, the two proposed hypotheses, and describe how the models obtained were evaluated. Section 4 presents the results obtained. Finally, conclusions are drawn in Section 5. 2. Network Problems Analyzed 2.1 Physical Cell Identity Conflict Each LTE cell has two identifiers with diff erent purposes: the Global Cell Identity (ID) and the PCI. The Global Cell ID is used to identify the cell from an operation, administration, and management perspective. The PCI is used to scramble the data in order to aid mobile phones in separating information from different transmitters [3]. Since an LTE network may contain a much larger number of cells than the 504 available values of PCIs, the same PCI must be reused by several cells. However, the user equipment (UE) cannot distinguish between two cells if they both have the same PCI and frequency, a situation called as PCI conflict. PCI confl icts can be divided into two cases: PCI confusions and PCI collisions. PCI confusions occur whenever an LTE cell has two different neighbor LTE cells with equal PCIs, in the same frequency band [4]. PCI collisions happen whenever an LTE cell has a neighbor LTE cell with identical PCI in the same frequency band [4]. A good PCI plan can be applied to avoid PCI conflicts. However, it can be difficult to do such a plan without getting any PCI conflicts in a dense network. Moreover, network changes – namely increased cell power and variable radio conditions – can lead to PCI conflicts. PCI confl icts can lead to an increase in dropped-call rate due to failed handovers, as well as an increase of blocked calls and channel interference [4]. 2.2.2 Root Sequence Index Collision The user equipment has to perform the LTE random- access procedure to connect to an LTE network, establish or reestablish a service connection, perform intra-system handovers, and synchronize for uplink and downlink data transfers. The LTE random-access procedure can be performed using two different solutions: allowing non- contention-based and contention-based solutions. An LTE cell uses 64 physical random-access channel (PRACH) preambles. Twenty-four of those preambles are reserved by the evolved-NodeB for non-contention-based access. The remaining 40 preambles are randomly selected by the user equipment for contention-based access [3]. The 40 physical random-access-channel preambles that the user equipment can use are calculated by the user equipment through the RSI parameters that the LTE cell transmits in the system information block 2 through the physical random-access channel [5]. Whenever two or more neighbor cells operate in the same frequency band and have the same RSI parameter, this results in the connected user equipment calculating the same 40 physical random-access channel preambles, increasing the occurrence of preamble collisions. The aforementioned problem is known as RSI collision, and can lead to an increase of failed service establishments and re-establishments, as well as an increase of failed handovers. 3. Methodology This study was performed using real data from an LTE network of a mobile network operator with a PCI reuse factor of three. Furthermore, data were collected for the same weekday of three consecutive weeks, for every period of 15 minutes, the minimum temporal granularity used by network operators, resulting in a daily total of 96 measurements. Using a CELFINET tool, it was possible to label cells that had PCI conflicts and/or RSI collisions. Source cells that had configured neighbor cells with equal PCI in the same frequency band were labeled as having a PCI collision. Source cells that had two or more neighbor cells with equal PCI in the same frequency band between themselves were labeled as having a PCI confusion. Source cells that had neighbor cells with equal RSI in the same frequency band were labeled as having an RSI collision. Cells that did not present any of these conflicts were labeled as non-confl icting. 3.1 Proposed Key Performance Indicators The fi rst step involved in collecting a list of key performance indicators for LTE equipment was to choose the most-relevant key performance indicators for detecting PCI confl icts and RSI collisions. The key performance indicators were chosen by taking into account the theory behind LTE and how PCI and RSI are used. Accordingly, the following key performance indicators were chosen for PCI confl ict detection: • Average CQI: the average channel quality indicator measured by the user equipment • UL PUCCH Interference Avg and UL PUSCH Interference Avg: the average measured interference in the physical uplink control and shared Channel • Service Establish: the amount of established service connections
The Radio Science Bulletin No 364 (March 2018) 13 • Service Drop Rate: the ratio of the dropped service occurrences • DL Avg Cell Throughput Mbps: the average measured cell downlink throughput in Mbit/s • DLAvg User Equipment Throughput Mbps: the average measured user equipment downlink throughput in Mbit/s • DL Latency ms: the average duration an Internet protocol packet takes since being sent by the user equipment until reaching back to it • RandomAcc Succ Rate: the success rate of established services made through the random access channel • IntraFreq Prep HO Succ Rate and IntraFreq Exec HO Succ Rate: the success rate of handover preparation and execution between cells operating in the same frequency band. To detect RSI collisions, a subsection of the aforementioned key performance indicators were selected, namely: UL PUCCH Interference Avg, UL PUSCH Interference Avg, Service Establish, IntraFreq Exec HO Succ Rate, IntraFreq Prep HO Succ Rate, and RandomAcc Succ Rate. After discarding cells with high null key performance indicator measurements and interpolating those of the remaining cells, it was decided to separate the data into different frequency bands, namely the 800 MHz and 1800 MHz bands. The 2100 MHz and 2600 MHz frequency bands were not considered, as they represented only 9% of the data, and had few occurrences of PCI conflicts and RSI collisions. This decision to separate the data into different frequency bands was taken in order to create frequency- dependent models, since different frequency bands have diff erent purposes. The cleaned data for PCI conflict detection in the 800 MHz frequency band consisted of 8666 non-conflicting cells, 1551 PCI confusions, and six PCI collisions. The 1800 MHz frequency-band data had 16675 non-conflicting cells, 1294 PCI confusions, and no PCI collisions. The data concerning each frequency band was split into 80% for the training set and 20% for the test set. Additionally, as PCI collisions are very rare, it was decided to do a 50% split for collisions, yielding three collisions in both the training and test sets. The cleaned data for RSI collision detection in the 800 MHz frequency band consisted of 10128 non- confl icting cells and 6774 RSI collisions. The 1800 MHz frequency-band data consisted of 17634 non-conflicting cells and 10916 RSI collisions. The data relative to each frequency band was split into 80% for the training set and 20% for the test set. 3.2 Considered Classifi cation Algorithms In order to reduce the bias from this study, five diff erent classification algorithms were set. The aim of the classifi ers was to classify cells as either non-conflicting or confl icting, depending on the detection use case. The considered classification algorithm implementations were taken from the Python Scikit-Learn library [6], and were the following: 3.2.1 Adaptive Boosting (AB) Adaptive Boosting is an ensemble method, which is a class of a machine-learning approaches based on the concept of creating a highly accurate classifier by combining several weak and inaccurate classifiers. Adaptive Boosting uses subsets of the original data to produce weak performing models (high bias, low variance) and then boosts their performance by combining them together based on a chosen cost function. Adaptive Boosting was the first practical boosting algorithm, and remains one of the most used and studied classifiers [7]. Its implementation uses decision-tree classifi ers as weak learners. 3.2.2 Gradient Boost (GB) Gradient Boost is another popular boosting algorithm for creating collections of classifiers. It differs from Adaptive Boosting because it calculates a negative gradient of a cost function (direction of quickest improvement), and picks a weak learner that is closest to the obtained gradient to add to the model [8]. The Gradient Boost implementation considered uses Decision Trees (DT) as weak learners. 3.2.3 Extremely Randomized Trees (ERT) This belongs to the family of tree ensemble methods, and uses a technique different from boosting, known as bagging. Bagging-based algorithms aim to control generalization error by perturbing and averaging the generated weak learners, such as decision trees. The Extremely Randomized Trees algorithm stands out from other tree-based ensemble classifiers because it strongly randomizes both feature and cut-point choice while splitting a tree node [9]. Extremely Randomized Trees aims to strongly reduce variance through a full randomization of the cut-point and feature combined with ensemble averaging when compared to other algorithms. By training each weak learner with the full training set instead of data subsets, Extremely Randomized Trees also minimizes bias.
14 The Radio Science Bulletin No 364 (March 2018) 3.2.4 Random Forest (RF) Random Forest is another bagging-based algorithm in the family of tree ensemble methods. Similarly to Extremely Randomized Trees, several small and weak trees can be grown in parallel, and these set of weak learners result in a strong classifi cation algorithm either by averaging or by majority vote [10]. Random Forest is similar to Extremely Randomized Trees, but differs in two aspects. Random Forest uses data subsets for growing its trees, while Extremely Randomized Trees uses the whole training set. Random Forest chooses a small subset of features to be chosen on splitting a node, while Extremely Randomized Trees chooses a random feature from all features. 3.2.5 Support Vector Machines (SVM) Support Vector Machines aim to separate data samples of different classes through hyperplanes that define decision boundaries. Similarly to Decision-Trees-based classifiers, Support Vector Machines are capable of handling linear and nonlinear classification tasks. The main idea behind Support Vector Machines is to map the original data samples from the input space into a high-dimensional feature space such that the classification task becomes simpler [11]. 3.3 Proposed Hypotheses In order to reduce bias even further, two hypotheses were proposed to find the one that led to the best-performing models for PCI confl ict and RSI collision detection. 3.3.1 Statistical Data Extraction Classification PCI confl icts and RSI collisions are better detected by extracting statistical calculations from the daily time series of each key performance indicator and using them as features for classification. The Python tsfresh tool was used to extract statistical data from the time series [12]. tsfresh applies several statistical calculations to the data, followed by feature elimination through statistical signifi cance testing. As it resulted in hundreds of features, Principal Component Analysis (PCA) was applied for dimensionality reduction before applying the data into the Support Vector Machine classifier. This decision was taken because Support Vector Machine takes longer to converge as the dimensionality increases, while it does not significantly increase the training and testing times of the tree-based classifiers. It was decided to use a number of principal components (PC) that led to 98% of the cumulative proportion of variance explained, maintaining most of the original variance. 3.3.2 Raw Cell Data Classification PCI confl icts and RSI collisions are better detected by using each cell’s daily key performance indicator measurements as an individual feature. This hypothesis was proposed to compare a more computationally intensive but simpler approach with the previous hypothesis. Moreover, as there were 96 daily measurements per key performance indicator in each cell, by using, for instance, 10 key performance indicators, this would have yielded 96 × 10 = 960 features. Due to the high dimensionality of the data to test this hypothesis, Principal Component Analysis was applied (once again) to reduce its dimensionality before using the Support Vector Machine classifier. It was decided to use a number of principal components that led to 98% of the cumulative proportion of variance explained. 3.4 Model Evaluation In a binary decision problem, a classification algorithm labels predictions as either positive or negative. A prediction for confl ict detection could fit into one of these four categories: True Positive (TP), conflicting cells correctly labeled as conflicting; False Positive (FP), non-conflicting cells incorrectly labeled as conflicting; True Negative (TN), non-conflicting cells correctly labeled as non-conflicting; False Negative (FN), conflicting cells incorrectly labeled as non-confl icting. As there was a high interest in knowing how well the models obtained could classify PCI conflicts and RSI collisions, the classic accuracy metric by itself was not enough. Classifi cations where a non-conflicting cell was erroneously classified as a conflict were to be avoided; it was thus chosen to additionally evaluate the models obtained through the precision and recall metrics. The metrics used could then be defined as follows: TP Recall TP FN , (1) TP Precision TP FP , (2) TP TN Accuracy TP TN FP FN , (3) where Recall measures the fraction of conflicting cells that are correctly labeled, Precision measures the fraction of cells classified as confl icting that are truly conflicting, and Accuracy measures the fraction of correctly classified cells [13]. Precision can be thought of as a measure of a classifi er’s exactness – a low precision can indicate a large
The Radio Science Bulletin No 364 (March 2018) 15 number of False Positives – while Recall can be seen as a measure of a classifier’s completeness: a low recall indicates many False Negatives. Since a classification algorithm can output the probabilities of a sample belonging to a specific class, the probability decision threshold can be tuned to alter the model’s classification outputs. For instance, increasing the probability decision threshold to classify a specific class may lead to an increase in Precision at the cost of a lower Recall. Precision-Recall (PR) curves are built by changing the decision probability threshold for a class. It thus was decided to also evaluate models through their Precision-Recall curves in order to perform a thorough model evaluation. Precision-Recall curves, often used in information retrieval [14], have been cited as an alternative to Receiver Operator Characteristic curves for tasks with a large skew in the class distribution, as in PCI conflict detection [15]. Additionally, the average Precision is also represented by the Precision-Recall curves through the areas under the curves. It should be noted that there is a tradeoff between the number of samples for model training, training duration, and model performance. With more data samples and more training time, the resulting model generalizes better and has more time to learn the data structure. 4. Results 4.1 Physical Cell Identity Confl ict Detection 4.1.1 Statistical Data Extraction Classification The fi rst hypothesis presented in Section 3.3 was tested using the data presented in Section 3.1. Regarding PCI confusion detection, tsfresh yielded 798 and 909 signifi cant features for the 800 MHz and 1800 MHz frequency bands, respectively. Concerning PCI collision detection, a total of 2200 features were extracted for the 800 MHz case that were not selected through hypothesis testing, due to the dataset only containing a marginally low number of six PCI collisions. Principal Component Analysis was applied for dimensionality reduction for a faster Support Vector Machine convergence. For PCI confusion detection, this resulted in 273 and 284 principal components for the 800 MHz and 1800 MHz frequency bands, respectively. The optimal hyperparameters to create each model were obtained through a grid search on the training set with 10-fold cross validation, maximizing the Precision metric. After training the models, they were tested on the test set, based on a decision probability threshold of 50%. The results are presented in Table 1. It should be added that when a classifier did not classify any True Positives or False Positives, the Precision was represented as a Not a Number (NaN), since it resulted in a division by zero. The Adaptive Boosting model had the best performance, with a 50% Precision for the 800 MHz frequency band. However, no model classified a sample as confl icting in the 1800 MHz frequency band data. In order to obtain more insights about the models’ performance, the Precision-Recall curves were obtained, and are represented in Figure 1. The highest average Precision was 27%, by using the Gradient Boost classifier. The Gradient Boost presented the highest Precision mostly throughout the plot. The Support Vector Machine was clearly the worst-performing model, especially in the 1800 MHz frequency band. The training and testing running times to obtain the Precision-Recall curves were also collected. Gradient Boost, which resulted in the two best models, had a testing time below one second and a training time below 30 seconds for both frequency bands. The learning curves were obtained, and they showed that the average Precision would only marginally increase with more data. Gradient Boost thus resulted in the overall best-performing models for both frequency bands by using statistical calculations as features. Regarding PCI collision detection, Principal Component Analysis resulted in 619 principal components to be used by the Support Vector Machine classifier for the 800 MHz frequency band. The optimal hyperparameters were obtained, and the test results were collected after training the models. A table with the results is not shown, as no tested model was able to classify a sample as conflicting. The Precision-Recall curves were obtained and plotted, showing a maximum Precision of 23% with 100% Recall by Random Forest, while this was approximately zero for the remaining classifiers (the plot is not illustrated in this paper as it would not add much information). 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT 85.24% NaN 00.00% 93.27% NaN 00.00% RF 85.24% NaN 00.00% 93.27% NaN 00.00% SVM 85.24% NaN 00.00% 93.27% NaN 00.00% AB 85.24% 50.00% 02.83% 93.27% NaN 00.00% GB 85.18% 46.00% 02.43% 93.27% NaN 00.00% Table 1. Statistical-data-based PCI confusion classifi cation results.
16 The Radio Science Bulletin No 364 (March 2018) 4.1.2 Raw Cell Data Classification The second hypothesis presented in Section 3.3 was tested using the data described in Section 3.1. Using each individual key performance indicator measure as a feature, an average filter with a window of size 20 was applied to reduce the noise interference. Principal Component Analysis was applied, which resulted in 634 principal components to be used by the Support Vector Machine classifier for both the 800 MHz and 1800 MHz frequency bands. Once again, the optimal hyperparameters were obtained through grid search, and the test results were collected after model training. The classification results for a 50% decision probability threshold are shown in Table 2. Overall, Gradient Boost was the classifier that led to the best performance, having the highest Accuracy and Recall for both frequency bands, but not the best Precision for the 1800 MHz frequency band. Both models created by the Extremely Randomized Trees and Random Forest classifiers had a 100% Precision for the 1800 MHz frequency band, which meant that Random Forest could result in the best model, as it had higher Recall. In order to see if Gradient Boost led to the best performing model, the Precision-Recall curves were obtained, and they are presented in Figure 2. Regarding the 800 MHz frequency band, Gradient Boost showed the highest average Precision, with a peak of 60% Precision for 4% Recall. Concerning the 1800 MHz frequency band, Extremely Randomized Trees presented the best average Precision, while Gradient Boost achieved higher Precision for a Recall lower than 5%. Additionally, Random Forest was not the best performing model, as was seen in Table 2. The training and testing running times for each model were obtained. In the 800 MHz frequency band, Gradient Boost, which led to the best-performing model, had a testing time below one second and a training time below 14 seconds. Regarding the 1800 MHz frequency band, Extremely Randomized Trees, which led to the best-performing model, was one of the quickest to train (i.e., 40.3 seconds), but it was one of the slowest to test (i.e., 1.4 seconds). Nevertheless, its overall performance was near real time. Regarding PCI collision detection, Principal Component Analysis resulted in 634 principal components for both frequency bands. The test results were collected with the optimal hyperparameters. The best performing model was the model obtained from Adaptive Boosting, as it detected one out of three PCI collisions with 100% Precision. However, due to the marginally low number of PCI collisions in the dataset, the results were not sufficiently signifi cant to draw any conclusions. Figure 1. The smoothed Precision-Recall curves for statistical-data-based PCI confusion detection. 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT 85.37% 22.22% 00.71% 93.57% 100% 00.45% RF 85.63% NaN 00.00% 93.60% 100% 00.90% SVM 85.63% NaN 00.00% 93.54% NaN 00.00% AB 85.63% NaN 00.00% 93.54% NaN 00.00% GB 85.73% 75.00% 01.07% 93.63% 80.00% 01.80% Table 2. Raw-cell-data PCI confusion classifi cation results.
The Radio Science Bulletin No 364 (March 2018) 17 4.2 Root Sequence Indicator Collision Detection 4.2.1 Statistical Data Extraction Classification The fi rst hypothesis presented in Section 3.3 was tested using the data described in Section 3.1. Regarding RSI collision detection, tsfresh yielded 732 and 851 signifi cant extracted features for the 800 MHz and 1800 MHz frequency bands, respectively. In order to reduce the data dimensionality for applying to the Support Vector Machine model, Principal Component Analysis was applied, resulting in 273 and 284 principal components for the 800 MHz and 1800 MHz frequency bands, respectively. The optimal hyperparameters were obtained through grid search, and the test results are presented in Table 3. The Extremely Randomized Trees model delivered the highest Precision for both frequency bands, but Gradient Boost had the highest overall Accuracy and Recall. In order to gain more insights regarding the performance of the models, Precision-Recall curves were obtained and are presented in Figure 3. The Gradient Boost model was the best for both frequency bands, having a Precision peak of 85% and an average Precision of 61%. The abnormal curve behavior of the Adaptive Boosting model was due to the assignment of several cells with the same probability values. The training and testing running times for each model were obtained. The Gradient Boost model showed testing times lower than one second; however, it had one of the highest training times. More specifically, it required 28.4 and 246 seconds of training time for the 800 MHz and 1800 MHz frequency bands, respectively. Nonetheless, the Gradient Boost model presented higher performance relative to other obtained models with near-real-time performance, thus overall being the best model. The learning curves obtained showed that the performance would not significantly increase if more data were added to the dataset. 4.2.2 Raw Cell Data Classification The second hypothesis presented in Section 3.3 was tested using the data described in Section 3.1. Using each individual key performance indicator’s measure as a feature, an average filter with a window of size 20 was applied. Principal Component Analysis was applied, which yielded in 332 principal components to be used by the Support Vector Machine classifier for both the 800 MHz and 1800 MHz frequency bands for RSI collision detection. The optimal hyperparameters were obtained through grid search, and the results are presented in Table 4. Once Figure 2. The smoothed Precision-Recall curves for raw-cell-data-based PCI confusion detection. 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT 60.32% 100% 00.48% 62.27% 72.97% 02.00% RF 64.93% 61.30% 32.62% 64.13% 66.94% 12.12% SVM 60.94% 54.80% 11.55% 61.79% NaN 00.00% AB 64.02% 56.79% 40.83% 66.37% 59.88% 36.29% GB 66.87% 61.60% 44.88% 69.39% 63.97% 45.53% Table 3. Statistical-data-based RSI collision classifi cation results.
18 The Radio Science Bulletin No 364 (March 2018) more, the Gradient Boost model revealed more Accuracy for both frequency bands. The Random Forest and Extremely Randomized Trees models had the highest Precision for the 800 MHz and 1800 MHz frequency bands. The Precision-Recall curves were obtained and are presented in Figure 4. The Gradient Boost model had the highest average Precision, while the Random Forest and Extremely Randomized Trees models showed slightly worse average Precision. The training and testing running time for each model were obtained. The Gradient Boost model showed testing times lower than one second, and the third highest training times for both frequency bands. More precisely, it took 12.8 and 24.4 seconds to train in the 800 MHz and 1800 MHz frequency bands, respectively. However, the Gradient Boost model’s performance was in near real time, and it was thus overall the best-performing model. The learning curves obtained showed that the results would improve if more data were added to the training set, especially for the Gradient Boost model. 5. Conclusions This paper tested two hypotheses regarding how well two distinct LTE network problems could be detected through supervised techniques with near-real-time performance. The PCI confusions were better detected by using the measurement of each cell’s daily key performance indicators as an individual feature. This was concluded due to the result that the average Precision was higher while testing this hypothesis. Specifically, the average Precisions reached 31% and 26% for the 800 MHz and 1800 MHz frequency bands, respectively. No conclusions could be reached regarding PCI collision detection due to the low number of PCI collisions in the data set. The RSI collisions were detected with similar performance by two proposed hypotheses. However, one could say that the best detection was obtained by using the measurement of each cell’s daily key performance indicators as an individual feature because the learning curves showed that the results would further improve if more data was added for the second hypothesis. The best-performing model was the model that used the Gradient Boost classifier, reaching average Precisions of 61% and 60% for the 800 MHz and 1800 MHz frequency bands, respectively. The results showed that supervised techniques for PCI and RSI confl ict detection are not well suited. This is because while a cell may have one of these two conflicts, the confl ict’s impact on the key performance indicators might be negligible. This fact can be due to several factors, such as the distance between cells, their azimuth, and the environment. For future work, an unsupervised approach for network confl ict detection followed by manual labeling to be used by a classifier could be investigated. This would 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT 59.49% 50.00% 00.83% 59.83% 75.00% 00.22% RF 61.70% 62.64% 13.52% 65.55% 63.86% 33.07% SVM 60.07% 52.24% 16.61% 59.25% 46.67% 09.14% AB 64.73% 60.38% 37.60% 64.99% 59.59% 40.32% GB 66.41% 60.84% 47.92% 66.22% 62.72% 39.52% Table 4. Raw-cell-data RSI collision classifi cation results. Figure 3. The smoothed Precision-Recall curves for statistical-data-based RSI collision detection.
The Radio Science Bulletin No 364 (March 2018) 19 result in the labeling of cells with significant differences between them, which could lead to better classification results.
Disclaimer:
This document was prepared by the Congressional Research Service (CRS). CRS serves as nonpartisan shared staff to
congressional committees and Members of Congress. It operates solely at the behest of and under the direction of Congress.
Information in a CRS Report should not be relied upon for purposes other than public understanding of information that has
been provided by CRS to Members of Congress in connection with CRS’s institutional role. CRS Reports, as a work of the
United States Government, are not subject to copyright protection in the United States. Any CRS Report may be
reproduced and distributed in its entirety without permission from CRS. However, as a CRS Report may include
copyrighted images or material from a third party, you may need to obtain the permission of the copyright holder if you
wish to copy or otherwise use copyrighted material.
Abstract
This paper tests two hypotheses regarding how
well two distinct Long-Term Evolution (LTE) network
problems can be detected through supervised techniques
with near-real-time performance. The tested network
problems are physical-cell-identity (PCI) conflicts and
root-sequence-index (RSI) collisions. These were labeled
through confi gured cell relations that verified these two
confl icts. Furthermore, a real LTE network was used. The
results obtained showed that both problems were best
detected by using each key performance indicator (KPI)
measurement as an individual feature. The highest average
precisions obtained for PCI conflict detection were 31%
and 26% for the 800 MHz and 1800 MHz frequency bands,
respectively. The highest average precisions obtained for
RSI collision detection were 61% and 60% for the 800 MHz
and 1800 MHz frequency bands, respectively.
1. Introduction
Two of the major concerns of mobile network operators
(MNO) are to optimize and to maintain network
performance. However, maintaining performance has
proven to be a challenge mainly for large and complex
networks. In the long term, changes made in the networks
may increase the number of conflicts and inconsistencies
that occur in them. These changes include changing the
tilting of antennas, changing the cell’s power, or even
changes that cannot be controlled by the mobile network
operators, such as user mobility and radio-channel fading.
In order to assess the network’s performance,
quantifiable performance metrics, known as key performance
indicators (KPI), are typically used. Key performance
indicators can report network performance such as the
handover success rate and the channel interference averages
of each cell, and are periodically calculated, resulting in time
series. A time series can be either univariate or multivariate.
As this study uses data samples that represent LTE cells
with several measured key performance indicators, then
the data consist of multivariate time series.
This paper focuses on applying supervised techniques
for detecting two known LTE network conflicts, namely
physical-cell identity (PCI) conflicts and root-sequence
index (RSI) collisions. The labeling used was only possible
due to a CELFINET product that allows obtaining cell
relations that label the two mentioned network conflicts;
also, real data obtained from a LTE network was used. The
aim of this paper is to test two hypotheses regarding how well
two distinct LTE network problems can be detected through
supervised techniques with near-real-time performance.
The resulting conflict-detection solution would then run
in an entity external from the LTE architecture during the
early morning. The solution would then alert the network
engineers of any existing conflicts in order to have prompt
responses.
As this paper aims to create models for near-real-
time detection of PCI conflicts and RSI collisions, the
popular k-nearest neighbors with dynamic-time-warping
classifi cation approach was not tested [1]. The reason for
this decision was based on the fact that it is computationally
intensive and very slow for large data sets, as was the case
for this paper.
In order to automatically detect the network fault
causes, some work has been done by using key performance
indicator measurements with unsupervised techniques, as
in[2].
12 The Radio Science Bulletin No 364 (March 2018)
The paper is organized as follows. Section 2 introduces
the analyzed network problems, namely PCI conflicts
and RSI collisions. Section 3 presents the chosen key
performance indicators and machine-learning (ML) models,
the two proposed hypotheses, and describe how the models
obtained were evaluated. Section 4 presents the results
obtained. Finally, conclusions are drawn in Section 5.
2. Network Problems Analyzed
2.1 Physical Cell Identity Conflict
Each LTE cell has two identifiers with diff erent
purposes: the Global Cell Identity (ID) and the PCI. The
Global Cell ID is used to identify the cell from an operation,
administration, and management perspective. The PCI is
used to scramble the data in order to aid mobile phones
in separating information from different transmitters [3].
Since an LTE network may contain a much larger number
of cells than the 504 available values of PCIs, the same
PCI must be reused by several cells. However, the user
equipment (UE) cannot distinguish between two cells if
they both have the same PCI and frequency, a situation
called as PCI conflict.
PCI confl icts can be divided into two cases: PCI
confusions and PCI collisions. PCI confusions occur
whenever an LTE cell has two different neighbor LTE
cells with equal PCIs, in the same frequency band [4]. PCI
collisions happen whenever an LTE cell has a neighbor
LTE cell with identical PCI in the same frequency band [4].
A good PCI plan can be applied to avoid PCI conflicts.
However, it can be difficult to do such a plan without getting
any PCI conflicts in a dense network. Moreover, network
changes – namely increased cell power and variable radio
conditions – can lead to PCI conflicts. PCI confl icts can
lead to an increase in dropped-call rate due to failed
handovers, as well as an increase of blocked calls and
channel interference [4].
2.2.2 Root Sequence Index
Collision
The user equipment has to perform the LTE random-
access procedure to connect to an LTE network, establish
or reestablish a service connection, perform intra-system
handovers, and synchronize for uplink and downlink
data transfers. The LTE random-access procedure can be
performed using two different solutions: allowing non-
contention-based and contention-based solutions. An LTE
cell uses 64 physical random-access channel (PRACH)
preambles. Twenty-four of those preambles are reserved
by the evolved-NodeB for non-contention-based access.
The remaining 40 preambles are randomly selected by the
user equipment for contention-based access [3].
The 40 physical random-access-channel preambles
that the user equipment can use are calculated by the user
equipment through the RSI parameters that the LTE cell
transmits in the system information block 2 through the
physical random-access channel [5]. Whenever two or more
neighbor cells operate in the same frequency band and have
the same RSI parameter, this results in the connected user
equipment calculating the same 40 physical random-access
channel preambles, increasing the occurrence of preamble
collisions. The aforementioned problem is known as RSI
collision, and can lead to an increase of failed service
establishments and re-establishments, as well as an increase
of failed handovers.
3. Methodology
This study was performed using real data from an
LTE network of a mobile network operator with a PCI
reuse factor of three. Furthermore, data were collected for
the same weekday of three consecutive weeks, for every
period of 15 minutes, the minimum temporal granularity
used by network operators, resulting in a daily total of 96
measurements.
Using a CELFINET tool, it was possible to label
cells that had PCI conflicts and/or RSI collisions. Source
cells that had configured neighbor cells with equal PCI
in the same frequency band were labeled as having a PCI
collision. Source cells that had two or more neighbor
cells with equal PCI in the same frequency band between
themselves were labeled as having a PCI confusion. Source
cells that had neighbor cells with equal RSI in the same
frequency band were labeled as having an RSI collision.
Cells that did not present any of these conflicts were labeled
as non-confl icting.
3.1 Proposed Key Performance
Indicators
The fi rst step involved in collecting a list of key
performance indicators for LTE equipment was to choose
the most-relevant key performance indicators for detecting
PCI confl icts and RSI collisions. The key performance
indicators were chosen by taking into account the theory
behind LTE and how PCI and RSI are used. Accordingly,
the following key performance indicators were chosen for
PCI confl ict detection:
• Average CQI: the average channel quality indicator
measured by the user equipment
• UL PUCCH Interference Avg and UL PUSCH
Interference Avg: the average measured interference
in the physical uplink control and shared Channel
• Service Establish: the amount of established service
connections
The Radio Science Bulletin No 364 (March 2018) 13
• Service Drop Rate: the ratio of the dropped service
occurrences
• DL Avg Cell Throughput Mbps: the average measured
cell downlink throughput in Mbit/s
• DLAvg User Equipment Throughput Mbps: the average
measured user equipment downlink throughput in Mbit/s
• DL Latency ms: the average duration an Internet protocol
packet takes since being sent by the user equipment
until reaching back to it
• RandomAcc Succ Rate: the success rate of established
services made through the random access channel
• IntraFreq Prep HO Succ Rate and IntraFreq Exec HO
Succ Rate: the success rate of handover preparation and
execution between cells operating in the same frequency
band.
To detect RSI collisions, a subsection of the
aforementioned key performance indicators were selected,
namely: UL PUCCH Interference Avg, UL PUSCH
Interference Avg, Service Establish, IntraFreq Exec HO
Succ Rate, IntraFreq Prep HO Succ Rate, and RandomAcc
Succ Rate.
After discarding cells with high null key performance
indicator measurements and interpolating those of the
remaining cells, it was decided to separate the data into
different frequency bands, namely the 800 MHz and
1800 MHz bands. The 2100 MHz and 2600 MHz frequency
bands were not considered, as they represented only 9% of
the data, and had few occurrences of PCI conflicts and RSI
collisions. This decision to separate the data into different
frequency bands was taken in order to create frequency-
dependent models, since different frequency bands have
diff erent purposes.
The cleaned data for PCI conflict detection in the
800 MHz frequency band consisted of 8666 non-conflicting
cells, 1551 PCI confusions, and six PCI collisions. The
1800 MHz frequency-band data had 16675 non-conflicting
cells, 1294 PCI confusions, and no PCI collisions. The
data concerning each frequency band was split into 80%
for the training set and 20% for the test set. Additionally,
as PCI collisions are very rare, it was decided to do a 50%
split for collisions, yielding three collisions in both the
training and test sets.
The cleaned data for RSI collision detection in
the 800 MHz frequency band consisted of 10128 non-
confl icting cells and 6774 RSI collisions. The 1800 MHz
frequency-band data consisted of 17634 non-conflicting
cells and 10916 RSI collisions. The data relative to each
frequency band was split into 80% for the training set and
20% for the test set.
3.2 Considered Classifi cation
Algorithms
In order to reduce the bias from this study, five
diff erent classification algorithms were set. The aim of the
classifi ers was to classify cells as either non-conflicting
or confl icting, depending on the detection use case. The
considered classification algorithm implementations were
taken from the Python Scikit-Learn library [6], and were
the following:
3.2.1 Adaptive Boosting (AB)
Adaptive Boosting is an ensemble method, which
is a class of a machine-learning approaches based on the
concept of creating a highly accurate classifier by combining
several weak and inaccurate classifiers. Adaptive Boosting
uses subsets of the original data to produce weak performing
models (high bias, low variance) and then boosts their
performance by combining them together based on a chosen
cost function. Adaptive Boosting was the first practical
boosting algorithm, and remains one of the most used and
studied classifiers [7]. Its implementation uses decision-tree
classifi ers as weak learners.
3.2.2 Gradient Boost (GB)
Gradient Boost is another popular boosting algorithm
for creating collections of classifiers. It differs from Adaptive
Boosting because it calculates a negative gradient of a cost
function (direction of quickest improvement), and picks
a weak learner that is closest to the obtained gradient to
add to the model [8]. The Gradient Boost implementation
considered uses Decision Trees (DT) as weak learners.
3.2.3 Extremely Randomized
Trees (ERT)
This belongs to the family of tree ensemble
methods, and uses a technique different from boosting,
known as bagging. Bagging-based algorithms aim to
control generalization error by perturbing and averaging
the generated weak learners, such as decision trees. The
Extremely Randomized Trees algorithm stands out from
other tree-based ensemble classifiers because it strongly
randomizes both feature and cut-point choice while splitting
a tree node [9]. Extremely Randomized Trees aims to
strongly reduce variance through a full randomization of the
cut-point and feature combined with ensemble averaging
when compared to other algorithms. By training each weak
learner with the full training set instead of data subsets,
Extremely Randomized Trees also minimizes bias.
14 The Radio Science Bulletin No 364 (March 2018)
3.2.4 Random Forest (RF)
Random Forest is another bagging-based algorithm in
the family of tree ensemble methods. Similarly to Extremely
Randomized Trees, several small and weak trees can be
grown in parallel, and these set of weak learners result in
a strong classifi cation algorithm either by averaging or by
majority vote [10]. Random Forest is similar to Extremely
Randomized Trees, but differs in two aspects. Random
Forest uses data subsets for growing its trees, while
Extremely Randomized Trees uses the whole training set.
Random Forest chooses a small subset of features to be
chosen on splitting a node, while Extremely Randomized
Trees chooses a random feature from all features.
3.2.5 Support Vector Machines
(SVM)
Support Vector Machines aim to separate data samples
of different classes through hyperplanes that define decision
boundaries. Similarly to Decision-Trees-based classifiers,
Support Vector Machines are capable of handling linear and
nonlinear classification tasks. The main idea behind Support
Vector Machines is to map the original data samples from
the input space into a high-dimensional feature space such
that the classification task becomes simpler [11].
3.3 Proposed Hypotheses
In order to reduce bias even further, two hypotheses
were proposed to find the one that led to the best-performing
models for PCI confl ict and RSI collision detection.
3.3.1 Statistical Data Extraction
Classification
PCI confl icts and RSI collisions are better detected
by extracting statistical calculations from the daily time
series of each key performance indicator and using them
as features for classification. The Python tsfresh tool
was used to extract statistical data from the time series
[12]. tsfresh applies several statistical calculations to
the data, followed by feature elimination through statistical
signifi cance testing. As it resulted in hundreds of features,
Principal Component Analysis (PCA) was applied for
dimensionality reduction before applying the data into
the Support Vector Machine classifier. This decision was
taken because Support Vector Machine takes longer to
converge as the dimensionality increases, while it does not
significantly increase the training and testing times of the
tree-based classifiers. It was decided to use a number of
principal components (PC) that led to 98% of the cumulative
proportion of variance explained, maintaining most of the
original variance.
3.3.2 Raw Cell Data Classification
PCI confl icts and RSI collisions are better detected
by using each cell’s daily key performance indicator
measurements as an individual feature. This hypothesis was
proposed to compare a more computationally intensive but
simpler approach with the previous hypothesis. Moreover,
as there were 96 daily measurements per key performance
indicator in each cell, by using, for instance, 10 key
performance indicators, this would have yielded 96 × 10
= 960 features. Due to the high dimensionality of the data
to test this hypothesis, Principal Component Analysis was
applied (once again) to reduce its dimensionality before
using the Support Vector Machine classifier. It was decided
to use a number of principal components that led to 98% of
the cumulative proportion of variance explained.
3.4 Model Evaluation
In a binary decision problem, a classification algorithm
labels predictions as either positive or negative. A prediction
for confl ict detection could fit into one of these four
categories: True Positive (TP), conflicting cells correctly
labeled as conflicting; False Positive (FP), non-conflicting
cells incorrectly labeled as conflicting; True Negative (TN),
non-conflicting cells correctly labeled as non-conflicting;
False Negative (FN), conflicting cells incorrectly labeled
as non-confl icting.
As there was a high interest in knowing how well
the models obtained could classify PCI conflicts and RSI
collisions, the classic accuracy metric by itself was not
enough. Classifi cations where a non-conflicting cell was
erroneously classified as a conflict were to be avoided; it
was thus chosen to additionally evaluate the models obtained
through the precision and recall metrics. The metrics used
could then be defined as follows:
TP
Recall TP FN
, (1)
TP
Precision TP FP
, (2)
TP TN
Accuracy TP TN FP FN
, (3)
where Recall measures the fraction of conflicting cells
that are correctly labeled, Precision measures the fraction
of cells classified as confl icting that are truly conflicting,
and Accuracy measures the fraction of correctly classified
cells [13]. Precision can be thought of as a measure of a
classifi er’s exactness – a low precision can indicate a large
The Radio Science Bulletin No 364 (March 2018) 15
number of False Positives – while Recall can be seen as a
measure of a classifier’s completeness: a low recall indicates
many False Negatives.
Since a classification algorithm can output the
probabilities of a sample belonging to a specific class, the
probability decision threshold can be tuned to alter the
model’s classification outputs. For instance, increasing
the probability decision threshold to classify a specific
class may lead to an increase in Precision at the cost of
a lower Recall. Precision-Recall (PR) curves are built by
changing the decision probability threshold for a class. It
thus was decided to also evaluate models through their
Precision-Recall curves in order to perform a thorough
model evaluation. Precision-Recall curves, often used in
information retrieval [14], have been cited as an alternative
to Receiver Operator Characteristic curves for tasks with
a large skew in the class distribution, as in PCI conflict
detection [15]. Additionally, the average Precision is also
represented by the Precision-Recall curves through the areas
under the curves. It should be noted that there is a tradeoff
between the number of samples for model training, training
duration, and model performance. With more data samples
and more training time, the resulting model generalizes
better and has more time to learn the data structure.
4. Results
4.1 Physical Cell Identity
Confl ict Detection
4.1.1 Statistical Data Extraction
Classification
The fi rst hypothesis presented in Section 3.3 was
tested using the data presented in Section 3.1. Regarding
PCI confusion detection, tsfresh yielded 798 and
909 signifi cant features for the 800 MHz and 1800 MHz
frequency bands, respectively. Concerning PCI collision
detection, a total of 2200 features were extracted for the
800 MHz case that were not selected through hypothesis
testing, due to the dataset only containing a marginally
low number of six PCI collisions. Principal Component
Analysis was applied for dimensionality reduction for
a faster Support Vector Machine convergence. For PCI
confusion detection, this resulted in 273 and 284 principal
components for the 800 MHz and 1800 MHz frequency
bands, respectively.
The optimal hyperparameters to create each model
were obtained through a grid search on the training set
with 10-fold cross validation, maximizing the Precision
metric. After training the models, they were tested on the
test set, based on a decision probability threshold of 50%.
The results are presented in Table 1.
It should be added that when a classifier did not classify
any True Positives or False Positives, the Precision was
represented as a Not a Number (NaN), since it resulted in
a division by zero. The Adaptive Boosting model had the
best performance, with a 50% Precision for the 800 MHz
frequency band. However, no model classified a sample as
confl icting in the 1800 MHz frequency band data.
In order to obtain more insights about the models’
performance, the Precision-Recall curves were obtained,
and are represented in Figure 1. The highest average
Precision was 27%, by using the Gradient Boost classifier.
The Gradient Boost presented the highest Precision mostly
throughout the plot. The Support Vector Machine was clearly
the worst-performing model, especially in the 1800 MHz
frequency band.
The training and testing running times to obtain the
Precision-Recall curves were also collected. Gradient Boost,
which resulted in the two best models, had a testing time
below one second and a training time below 30 seconds for
both frequency bands. The learning curves were obtained,
and they showed that the average Precision would only
marginally increase with more data. Gradient Boost thus
resulted in the overall best-performing models for both
frequency bands by using statistical calculations as features.
Regarding PCI collision detection, Principal
Component Analysis resulted in 619 principal components
to be used by the Support Vector Machine classifier for the
800 MHz frequency band. The optimal hyperparameters
were obtained, and the test results were collected after
training the models. A table with the results is not shown, as
no tested model was able to classify a sample as conflicting.
The Precision-Recall curves were obtained and plotted,
showing a maximum Precision of 23% with 100% Recall
by Random Forest, while this was approximately zero for
the remaining classifiers (the plot is not illustrated in this
paper as it would not add much information).
800 MHz Band 1800 MHz Band
Model Accuracy Precision Recall Accuracy Precision Recall
ERT 85.24% NaN 00.00% 93.27% NaN 00.00%
RF 85.24% NaN 00.00% 93.27% NaN 00.00%
SVM 85.24% NaN 00.00% 93.27% NaN 00.00%
AB 85.24% 50.00% 02.83% 93.27% NaN 00.00%
GB 85.18% 46.00% 02.43% 93.27% NaN 00.00%
Table 1. Statistical-data-based PCI confusion classifi cation results.
16 The Radio Science Bulletin No 364 (March 2018)
4.1.2 Raw Cell Data Classification
The second hypothesis presented in Section 3.3 was
tested using the data described in Section 3.1. Using each
individual key performance indicator measure as a feature,
an average filter with a window of size 20 was applied to
reduce the noise interference. Principal Component Analysis
was applied, which resulted in 634 principal components to
be used by the Support Vector Machine classifier for both
the 800 MHz and 1800 MHz frequency bands.
Once again, the optimal hyperparameters were
obtained through grid search, and the test results were
collected after model training. The classification results for
a 50% decision probability threshold are shown in Table 2.
Overall, Gradient Boost was the classifier that led to the
best performance, having the highest Accuracy and Recall
for both frequency bands, but not the best Precision for
the 1800 MHz frequency band. Both models created by the
Extremely Randomized Trees and Random Forest classifiers
had a 100% Precision for the 1800 MHz frequency band,
which meant that Random Forest could result in the best
model, as it had higher Recall.
In order to see if Gradient Boost led to the best
performing model, the Precision-Recall curves were
obtained, and they are presented in Figure 2. Regarding
the 800 MHz frequency band, Gradient Boost showed the
highest average Precision, with a peak of 60% Precision
for 4% Recall. Concerning the 1800 MHz frequency band,
Extremely Randomized Trees presented the best average
Precision, while Gradient Boost achieved higher Precision
for a Recall lower than 5%. Additionally, Random Forest
was not the best performing model, as was seen in Table 2.
The training and testing running times for each
model were obtained. In the 800 MHz frequency band,
Gradient Boost, which led to the best-performing model,
had a testing time below one second and a training time
below 14 seconds. Regarding the 1800 MHz frequency
band, Extremely Randomized Trees, which led to the
best-performing model, was one of the quickest to train
(i.e., 40.3 seconds), but it was one of the slowest to test
(i.e., 1.4 seconds). Nevertheless, its overall performance
was near real time.
Regarding PCI collision detection, Principal
Component Analysis resulted in 634 principal components
for both frequency bands. The test results were collected
with the optimal hyperparameters. The best performing
model was the model obtained from Adaptive Boosting,
as it detected one out of three PCI collisions with 100%
Precision. However, due to the marginally low number of
PCI collisions in the dataset, the results were not sufficiently
signifi cant to draw any conclusions.
Figure 1. The smoothed Precision-Recall curves for statistical-data-based PCI confusion detection.
800 MHz Band 1800 MHz Band
Model Accuracy Precision Recall Accuracy Precision Recall
ERT 85.37% 22.22% 00.71% 93.57% 100% 00.45%
RF 85.63% NaN 00.00% 93.60% 100% 00.90%
SVM 85.63% NaN 00.00% 93.54% NaN 00.00%
AB 85.63% NaN 00.00% 93.54% NaN 00.00%
GB 85.73% 75.00% 01.07% 93.63% 80.00% 01.80%
Table 2. Raw-cell-data PCI confusion classifi cation results.
The Radio Science Bulletin No 364 (March 2018) 17
4.2 Root Sequence Indicator
Collision Detection
4.2.1 Statistical Data Extraction
Classification
The fi rst hypothesis presented in Section 3.3 was
tested using the data described in Section 3.1. Regarding
RSI collision detection, tsfresh yielded 732 and
851 signifi cant extracted features for the 800 MHz and
1800 MHz frequency bands, respectively. In order to
reduce the data dimensionality for applying to the Support
Vector Machine model, Principal Component Analysis was
applied, resulting in 273 and 284 principal components for
the 800 MHz and 1800 MHz frequency bands, respectively.
The optimal hyperparameters were obtained through
grid search, and the test results are presented in Table 3.
The Extremely Randomized Trees model delivered the
highest Precision for both frequency bands, but Gradient
Boost had the highest overall Accuracy and Recall.
In order to gain more insights regarding the
performance of the models, Precision-Recall curves were
obtained and are presented in Figure 3. The Gradient Boost
model was the best for both frequency bands, having a
Precision peak of 85% and an average Precision of 61%.
The abnormal curve behavior of the Adaptive Boosting
model was due to the assignment of several cells with the
same probability values.
The training and testing running times for each model
were obtained. The Gradient Boost model showed testing
times lower than one second; however, it had one of the
highest training times. More specifically, it required 28.4
and 246 seconds of training time for the 800 MHz and
1800 MHz frequency bands, respectively. Nonetheless,
the Gradient Boost model presented higher performance
relative to other obtained models with near-real-time
performance, thus overall being the best model. The learning
curves obtained showed that the performance would not
significantly increase if more data were added to the dataset.
4.2.2 Raw Cell Data Classification
The second hypothesis presented in Section 3.3 was
tested using the data described in Section 3.1. Using each
individual key performance indicator’s measure as a feature,
an average filter with a window of size 20 was applied.
Principal Component Analysis was applied, which yielded in
332 principal components to be used by the Support Vector
Machine classifier for both the 800 MHz and 1800 MHz
frequency bands for RSI collision detection.
The optimal hyperparameters were obtained through
grid search, and the results are presented in Table 4. Once
Figure 2. The smoothed Precision-Recall curves for raw-cell-data-based PCI confusion detection.
800 MHz Band 1800 MHz Band
Model Accuracy Precision Recall Accuracy Precision Recall
ERT 60.32% 100% 00.48% 62.27% 72.97% 02.00%
RF 64.93% 61.30% 32.62% 64.13% 66.94% 12.12%
SVM 60.94% 54.80% 11.55% 61.79% NaN 00.00%
AB 64.02% 56.79% 40.83% 66.37% 59.88% 36.29%
GB 66.87% 61.60% 44.88% 69.39% 63.97% 45.53%
Table 3. Statistical-data-based RSI collision classifi cation results.
18 The Radio Science Bulletin No 364 (March 2018)
more, the Gradient Boost model revealed more Accuracy for
both frequency bands. The Random Forest and Extremely
Randomized Trees models had the highest Precision for
the 800 MHz and 1800 MHz frequency bands.
The Precision-Recall curves were obtained and are
presented in Figure 4. The Gradient Boost model had the
highest average Precision, while the Random Forest and
Extremely Randomized Trees models showed slightly
worse average Precision.
The training and testing running time for each model
were obtained. The Gradient Boost model showed testing
times lower than one second, and the third highest training
times for both frequency bands. More precisely, it took 12.8
and 24.4 seconds to train in the 800 MHz and 1800 MHz
frequency bands, respectively. However, the Gradient
Boost model’s performance was in near real time, and it
was thus overall the best-performing model. The learning
curves obtained showed that the results would improve if
more data were added to the training set, especially for the
Gradient Boost model.
5. Conclusions
This paper tested two hypotheses regarding how
well two distinct LTE network problems could be detected
through supervised techniques with near-real-time
performance.
The PCI confusions were better detected by using
the measurement of each cell’s daily key performance
indicators as an individual feature. This was concluded due
to the result that the average Precision was higher while
testing this hypothesis. Specifically, the average Precisions
reached 31% and 26% for the 800 MHz and 1800 MHz
frequency bands, respectively. No conclusions could be
reached regarding PCI collision detection due to the low
number of PCI collisions in the data set.
The RSI collisions were detected with similar
performance by two proposed hypotheses. However, one
could say that the best detection was obtained by using the
measurement of each cell’s daily key performance indicators
as an individual feature because the learning curves showed
that the results would further improve if more data was added
for the second hypothesis. The best-performing model was
the model that used the Gradient Boost classifier, reaching
average Precisions of 61% and 60% for the 800 MHz and
1800 MHz frequency bands, respectively.
The results showed that supervised techniques for
PCI and RSI confl ict detection are not well suited. This is
because while a cell may have one of these two conflicts,
the confl ict’s impact on the key performance indicators
might be negligible. This fact can be due to several factors,
such as the distance between cells, their azimuth, and the
environment. For future work, an unsupervised approach
for network confl ict detection followed by manual labeling
to be used by a classifier could be investigated. This would
800 MHz Band 1800 MHz Band
Model Accuracy Precision Recall Accuracy Precision Recall
ERT 59.49% 50.00% 00.83% 59.83% 75.00% 00.22%
RF 61.70% 62.64% 13.52% 65.55% 63.86% 33.07%
SVM 60.07% 52.24% 16.61% 59.25% 46.67% 09.14%
AB 64.73% 60.38% 37.60% 64.99% 59.59% 40.32%
GB 66.41% 60.84% 47.92% 66.22% 62.72% 39.52%
Table 4. Raw-cell-data RSI collision classifi cation results.
Figure 3. The smoothed Precision-Recall curves for statistical-data-based RSI collision detection.
The Radio Science Bulletin No 364 (March 2018) 19
result in the labeling of cells with significant differences
between them, which could lead to better classification
results.