23.5.23

Sort:

GalaxyPen15

May 22, 2023

Disclaimer:
This document was prepared by the Congressional Research Service (CRS). CRS serves as nonpartisan shared staff to
congressional committees and Members of Congress. It operates solely at the behest of and under the direction of Congress.
Information in a CRS Report should not be relied upon for purposes other than public understanding of information that has
been provided by CRS to Members of Congress in connection with CRS’s institutional role. CRS Reports, as a work of the
United States Government, are not subject to copyright protection in the United States. Any CRS Report may be
reproduced and distributed in its entirety without permission from CRS. However, as a CRS Report may include
copyrighted images or material from a third party, you may need to obtain the permission of the copyright holder if you
wish to copy or otherwise use copyrighted material.

Abstract
This paper tests two hypotheses regarding how
well two distinct Long-Term Evolution (LTE) network
problems can be detected through supervised techniques
with near-real-time performance. The tested network
problems are physical-cell-identity (PCI) conflicts and
root-sequence-index (RSI) collisions. These were labeled
through confi gured cell relations that verified these two
confl icts. Furthermore, a real LTE network was used. The
results obtained showed that both problems were best
detected by using each key performance indicator (KPI)
measurement as an individual feature. The highest average
precisions obtained for PCI conflict detection were 31%
and 26% for the 800 MHz and 1800 MHz frequency bands,
respectively. The highest average precisions obtained for
RSI collision detection were 61% and 60% for the 800 MHz
and 1800 MHz frequency bands, respectively.
1. Introduction
Two of the major concerns of mobile network operators
(MNO) are to optimize and to maintain network
performance. However, maintaining performance has
proven to be a challenge mainly for large and complex
networks. In the long term, changes made in the networks
may increase the number of conflicts and inconsistencies
that occur in them. These changes include changing the
tilting of antennas, changing the cell’s power, or even
changes that cannot be controlled by the mobile network
operators, such as user mobility and radio-channel fading.
In order to assess the network’s performance,
quantifiable performance metrics, known as key performance
indicators (KPI), are typically used. Key performance
indicators can report network performance such as the
handover success rate and the channel interference averages
of each cell, and are periodically calculated, resulting in time
series. A time series can be either univariate or multivariate.
As this study uses data samples that represent LTE cells
with several measured key performance indicators, then
the data consist of multivariate time series.
This paper focuses on applying supervised techniques
for detecting two known LTE network conflicts, namely
physical-cell identity (PCI) conflicts and root-sequence
index (RSI) collisions. The labeling used was only possible
due to a CELFINET product that allows obtaining cell
relations that label the two mentioned network conflicts;
also, real data obtained from a LTE network was used. The
aim of this paper is to test two hypotheses regarding how well
two distinct LTE network problems can be detected through
supervised techniques with near-real-time performance.
The resulting conflict-detection solution would then run
in an entity external from the LTE architecture during the
early morning. The solution would then alert the network
engineers of any existing conflicts in order to have prompt
responses.
As this paper aims to create models for near-real-
time detection of PCI conflicts and RSI collisions, the
popular k-nearest neighbors with dynamic-time-warping
classifi cation approach was not tested [1]. The reason for
this decision was based on the fact that it is computationally
intensive and very slow for large data sets, as was the case
for this paper.
In order to automatically detect the network fault
causes, some work has been done by using key performance
indicator measurements with unsupervised techniques, as
in[2].

12 The Radio Science Bulletin No 364 (March 2018)
The paper is organized as follows. Section 2 introduces
the analyzed network problems, namely PCI conflicts
and RSI collisions. Section 3 presents the chosen key
performance indicators and machine-learning (ML) models,
the two proposed hypotheses, and describe how the models
obtained were evaluated. Section 4 presents the results
obtained. Finally, conclusions are drawn in Section 5.
2. Network Problems Analyzed
2.1 Physical Cell Identity Conflict
Each LTE cell has two identifiers with diff erent
purposes: the Global Cell Identity (ID) and the PCI. The
Global Cell ID is used to identify the cell from an operation,
administration, and management perspective. The PCI is
used to scramble the data in order to aid mobile phones
in separating information from different transmitters [3].
Since an LTE network may contain a much larger number
of cells than the 504 available values of PCIs, the same
PCI must be reused by several cells. However, the user
equipment (UE) cannot distinguish between two cells if
they both have the same PCI and frequency, a situation
called as PCI conflict.
PCI confl icts can be divided into two cases: PCI
confusions and PCI collisions. PCI confusions occur
whenever an LTE cell has two different neighbor LTE
cells with equal PCIs, in the same frequency band [4]. PCI
collisions happen whenever an LTE cell has a neighbor
LTE cell with identical PCI in the same frequency band [4].
A good PCI plan can be applied to avoid PCI conflicts.
However, it can be difficult to do such a plan without getting
any PCI conflicts in a dense network. Moreover, network
changes – namely increased cell power and variable radio
conditions – can lead to PCI conflicts. PCI confl icts can
lead to an increase in dropped-call rate due to failed
handovers, as well as an increase of blocked calls and
channel interference [4].
2.2.2 Root Sequence Index
Collision
The user equipment has to perform the LTE random-
access procedure to connect to an LTE network, establish
or reestablish a service connection, perform intra-system
handovers, and synchronize for uplink and downlink
data transfers. The LTE random-access procedure can be
performed using two different solutions: allowing non-
contention-based and contention-based solutions. An LTE
cell uses 64 physical random-access channel (PRACH)
preambles. Twenty-four of those preambles are reserved
by the evolved-NodeB for non-contention-based access.
The remaining 40 preambles are randomly selected by the
user equipment for contention-based access [3].
The 40 physical random-access-channel preambles
that the user equipment can use are calculated by the user
equipment through the RSI parameters that the LTE cell
transmits in the system information block 2 through the
physical random-access channel [5]. Whenever two or more
neighbor cells operate in the same frequency band and have
the same RSI parameter, this results in the connected user
equipment calculating the same 40 physical random-access
channel preambles, increasing the occurrence of preamble
collisions. The aforementioned problem is known as RSI
collision, and can lead to an increase of failed service
establishments and re-establishments, as well as an increase
of failed handovers.
3. Methodology
This study was performed using real data from an
LTE network of a mobile network operator with a PCI
reuse factor of three. Furthermore, data were collected for
the same weekday of three consecutive weeks, for every
period of 15 minutes, the minimum temporal granularity
used by network operators, resulting in a daily total of 96
measurements.
Using a CELFINET tool, it was possible to label
cells that had PCI conflicts and/or RSI collisions. Source
cells that had configured neighbor cells with equal PCI
in the same frequency band were labeled as having a PCI
collision. Source cells that had two or more neighbor
cells with equal PCI in the same frequency band between
themselves were labeled as having a PCI confusion. Source
cells that had neighbor cells with equal RSI in the same
frequency band were labeled as having an RSI collision.
Cells that did not present any of these conflicts were labeled
as non-confl icting.
3.1 Proposed Key Performance
Indicators
The fi rst step involved in collecting a list of key
performance indicators for LTE equipment was to choose
the most-relevant key performance indicators for detecting
PCI confl icts and RSI collisions. The key performance
indicators were chosen by taking into account the theory
behind LTE and how PCI and RSI are used. Accordingly,
the following key performance indicators were chosen for
PCI confl ict detection:
• Average CQI: the average channel quality indicator
measured by the user equipment
• UL PUCCH Interference Avg and UL PUSCH
Interference Avg: the average measured interference
in the physical uplink control and shared Channel
• Service Establish: the amount of established service
connections

The Radio Science Bulletin No 364 (March 2018) 13
• Service Drop Rate: the ratio of the dropped service
occurrences
• DL Avg Cell Throughput Mbps: the average measured
cell downlink throughput in Mbit/s
• DLAvg User Equipment Throughput Mbps: the average
measured user equipment downlink throughput in Mbit/s
• DL Latency ms: the average duration an Internet protocol
packet takes since being sent by the user equipment
until reaching back to it
• RandomAcc Succ Rate: the success rate of established
services made through the random access channel
• IntraFreq Prep HO Succ Rate and IntraFreq Exec HO
Succ Rate: the success rate of handover preparation and
execution between cells operating in the same frequency
band.
To detect RSI collisions, a subsection of the
aforementioned key performance indicators were selected,
namely: UL PUCCH Interference Avg, UL PUSCH
Interference Avg, Service Establish, IntraFreq Exec HO
Succ Rate, IntraFreq Prep HO Succ Rate, and RandomAcc
Succ Rate.
After discarding cells with high null key performance
indicator measurements and interpolating those of the
remaining cells, it was decided to separate the data into
different frequency bands, namely the 800 MHz and
1800 MHz bands. The 2100 MHz and 2600 MHz frequency
bands were not considered, as they represented only 9% of
the data, and had few occurrences of PCI conflicts and RSI
collisions. This decision to separate the data into different
frequency bands was taken in order to create frequency-
dependent models, since different frequency bands have
diff erent purposes.
The cleaned data for PCI conflict detection in the
800 MHz frequency band consisted of 8666 non-conflicting
cells, 1551 PCI confusions, and six PCI collisions. The
1800 MHz frequency-band data had 16675 non-conflicting
cells, 1294 PCI confusions, and no PCI collisions. The
data concerning each frequency band was split into 80%
for the training set and 20% for the test set. Additionally,
as PCI collisions are very rare, it was decided to do a 50%
split for collisions, yielding three collisions in both the
training and test sets.
The cleaned data for RSI collision detection in
the 800 MHz frequency band consisted of 10128 non-
confl icting cells and 6774 RSI collisions. The 1800 MHz
frequency-band data consisted of 17634 non-conflicting
cells and 10916 RSI collisions. The data relative to each
frequency band was split into 80% for the training set and
20% for the test set.
3.2 Considered Classifi cation
Algorithms
In order to reduce the bias from this study, five
diff erent classification algorithms were set. The aim of the
classifi ers was to classify cells as either non-conflicting
or confl icting, depending on the detection use case. The
considered classification algorithm implementations were
taken from the Python Scikit-Learn library [6], and were
the following:
3.2.1 Adaptive Boosting (AB)
Adaptive Boosting is an ensemble method, which
is a class of a machine-learning approaches based on the
concept of creating a highly accurate classifier by combining
several weak and inaccurate classifiers. Adaptive Boosting
uses subsets of the original data to produce weak performing
models (high bias, low variance) and then boosts their
performance by combining them together based on a chosen
cost function. Adaptive Boosting was the first practical
boosting algorithm, and remains one of the most used and
studied classifiers [7]. Its implementation uses decision-tree
classifi ers as weak learners.
3.2.2 Gradient Boost (GB)
Gradient Boost is another popular boosting algorithm
for creating collections of classifiers. It differs from Adaptive
Boosting because it calculates a negative gradient of a cost
function (direction of quickest improvement), and picks
a weak learner that is closest to the obtained gradient to
add to the model [8]. The Gradient Boost implementation
considered uses Decision Trees (DT) as weak learners.
3.2.3 Extremely Randomized
Trees (ERT)
This belongs to the family of tree ensemble
methods, and uses a technique different from boosting,
known as bagging. Bagging-based algorithms aim to
control generalization error by perturbing and averaging
the generated weak learners, such as decision trees. The
Extremely Randomized Trees algorithm stands out from
other tree-based ensemble classifiers because it strongly
randomizes both feature and cut-point choice while splitting
a tree node [9]. Extremely Randomized Trees aims to
strongly reduce variance through a full randomization of the
cut-point and feature combined with ensemble averaging
when compared to other algorithms. By training each weak
learner with the full training set instead of data subsets,
Extremely Randomized Trees also minimizes bias.

14 The Radio Science Bulletin No 364 (March 2018)
3.2.4 Random Forest (RF)
Random Forest is another bagging-based algorithm in
the family of tree ensemble methods. Similarly to Extremely
Randomized Trees, several small and weak trees can be
grown in parallel, and these set of weak learners result in
a strong classifi cation algorithm either by averaging or by
majority vote [10]. Random Forest is similar to Extremely
Randomized Trees, but differs in two aspects. Random
Forest uses data subsets for growing its trees, while
Extremely Randomized Trees uses the whole training set.
Random Forest chooses a small subset of features to be
chosen on splitting a node, while Extremely Randomized
Trees chooses a random feature from all features.
3.2.5 Support Vector Machines
(SVM)
Support Vector Machines aim to separate data samples
of different classes through hyperplanes that define decision
boundaries. Similarly to Decision-Trees-based classifiers,
Support Vector Machines are capable of handling linear and
nonlinear classification tasks. The main idea behind Support
Vector Machines is to map the original data samples from
the input space into a high-dimensional feature space such
that the classification task becomes simpler [11].
3.3 Proposed Hypotheses
In order to reduce bias even further, two hypotheses
were proposed to find the one that led to the best-performing
models for PCI confl ict and RSI collision detection.
3.3.1 Statistical Data Extraction
Classification
PCI confl icts and RSI collisions are better detected
by extracting statistical calculations from the daily time
series of each key performance indicator and using them
as features for classification. The Python tsfresh tool
was used to extract statistical data from the time series
[12]. tsfresh applies several statistical calculations to
the data, followed by feature elimination through statistical
signifi cance testing. As it resulted in hundreds of features,
Principal Component Analysis (PCA) was applied for
dimensionality reduction before applying the data into
the Support Vector Machine classifier. This decision was
taken because Support Vector Machine takes longer to
converge as the dimensionality increases, while it does not
significantly increase the training and testing times of the
tree-based classifiers. It was decided to use a number of
principal components (PC) that led to 98% of the cumulative
proportion of variance explained, maintaining most of the
original variance.
3.3.2 Raw Cell Data Classification
PCI confl icts and RSI collisions are better detected
by using each cell’s daily key performance indicator
measurements as an individual feature. This hypothesis was
proposed to compare a more computationally intensive but
simpler approach with the previous hypothesis. Moreover,
as there were 96 daily measurements per key performance
indicator in each cell, by using, for instance, 10 key
performance indicators, this would have yielded 96 × 10
= 960 features. Due to the high dimensionality of the data
to test this hypothesis, Principal Component Analysis was
applied (once again) to reduce its dimensionality before
using the Support Vector Machine classifier. It was decided
to use a number of principal components that led to 98% of
the cumulative proportion of variance explained.
3.4 Model Evaluation
In a binary decision problem, a classification algorithm
labels predictions as either positive or negative. A prediction
for confl ict detection could fit into one of these four
categories: True Positive (TP), conflicting cells correctly
labeled as conflicting; False Positive (FP), non-conflicting
cells incorrectly labeled as conflicting; True Negative (TN),
non-conflicting cells correctly labeled as non-conflicting;
False Negative (FN), conflicting cells incorrectly labeled
as non-confl icting.
As there was a high interest in knowing how well
the models obtained could classify PCI conflicts and RSI
collisions, the classic accuracy metric by itself was not
enough. Classifi cations where a non-conflicting cell was
erroneously classified as a conflict were to be avoided; it
was thus chosen to additionally evaluate the models obtained
through the precision and recall metrics. The metrics used
could then be defined as follows:
TP
Recall TP FN
  , (1)
TP
Precision TP FP
  , (2)
TP TN
Accuracy TP TN FP FN

    , (3)
where Recall measures the fraction of conflicting cells
that are correctly labeled, Precision measures the fraction
of cells classified as confl icting that are truly conflicting,
and Accuracy measures the fraction of correctly classified
cells [13]. Precision can be thought of as a measure of a
classifi er’s exactness – a low precision can indicate a large

The Radio Science Bulletin No 364 (March 2018) 15
number of False Positives – while Recall can be seen as a
measure of a classifier’s completeness: a low recall indicates
many False Negatives.
Since a classification algorithm can output the
probabilities of a sample belonging to a specific class, the
probability decision threshold can be tuned to alter the
model’s classification outputs. For instance, increasing
the probability decision threshold to classify a specific
class may lead to an increase in Precision at the cost of
a lower Recall. Precision-Recall (PR) curves are built by
changing the decision probability threshold for a class. It
thus was decided to also evaluate models through their
Precision-Recall curves in order to perform a thorough
model evaluation. Precision-Recall curves, often used in
information retrieval [14], have been cited as an alternative
to Receiver Operator Characteristic curves for tasks with
a large skew in the class distribution, as in PCI conflict
detection [15]. Additionally, the average Precision is also
represented by the Precision-Recall curves through the areas
under the curves. It should be noted that there is a tradeoff
between the number of samples for model training, training
duration, and model performance. With more data samples
and more training time, the resulting model generalizes
better and has more time to learn the data structure.
4. Results
4.1 Physical Cell Identity
Confl ict Detection
4.1.1 Statistical Data Extraction
Classification
The fi rst hypothesis presented in Section 3.3 was
tested using the data presented in Section 3.1. Regarding
PCI confusion detection, tsfresh yielded 798 and
909 signifi cant features for the 800 MHz and 1800 MHz
frequency bands, respectively. Concerning PCI collision
detection, a total of 2200 features were extracted for the
800 MHz case that were not selected through hypothesis
testing, due to the dataset only containing a marginally
low number of six PCI collisions. Principal Component
Analysis was applied for dimensionality reduction for
a faster Support Vector Machine convergence. For PCI
confusion detection, this resulted in 273 and 284 principal
components for the 800 MHz and 1800 MHz frequency
bands, respectively.
The optimal hyperparameters to create each model
were obtained through a grid search on the training set
with 10-fold cross validation, maximizing the Precision
metric. After training the models, they were tested on the
test set, based on a decision probability threshold of 50%.
The results are presented in Table 1.
It should be added that when a classifier did not classify
any True Positives or False Positives, the Precision was
represented as a Not a Number (NaN), since it resulted in
a division by zero. The Adaptive Boosting model had the
best performance, with a 50% Precision for the 800 MHz
frequency band. However, no model classified a sample as
confl icting in the 1800 MHz frequency band data.
In order to obtain more insights about the models’
performance, the Precision-Recall curves were obtained,
and are represented in Figure 1. The highest average
Precision was 27%, by using the Gradient Boost classifier.
The Gradient Boost presented the highest Precision mostly
throughout the plot. The Support Vector Machine was clearly
the worst-performing model, especially in the 1800 MHz
frequency band.
The training and testing running times to obtain the
Precision-Recall curves were also collected. Gradient Boost,
which resulted in the two best models, had a testing time
below one second and a training time below 30 seconds for
both frequency bands. The learning curves were obtained,
and they showed that the average Precision would only
marginally increase with more data. Gradient Boost thus
resulted in the overall best-performing models for both
frequency bands by using statistical calculations as features.
Regarding PCI collision detection, Principal
Component Analysis resulted in 619 principal components
to be used by the Support Vector Machine classifier for the
800 MHz frequency band. The optimal hyperparameters
were obtained, and the test results were collected after
training the models. A table with the results is not shown, as
no tested model was able to classify a sample as conflicting.
The Precision-Recall curves were obtained and plotted,
showing a maximum Precision of 23% with 100% Recall
by Random Forest, while this was approximately zero for
the remaining classifiers (the plot is not illustrated in this
paper as it would not add much information).
800 MHz Band 1800 MHz Band
Model Accuracy Precision Recall Accuracy Precision Recall
ERT 85.24% NaN 00.00% 93.27% NaN 00.00%
RF 85.24% NaN 00.00% 93.27% NaN 00.00%
SVM 85.24% NaN 00.00% 93.27% NaN 00.00%
AB 85.24% 50.00% 02.83% 93.27% NaN 00.00%
GB 85.18% 46.00% 02.43% 93.27% NaN 00.00%
Table 1. Statistical-data-based PCI confusion classifi cation results.

16 The Radio Science Bulletin No 364 (March 2018)
4.1.2 Raw Cell Data Classification
The second hypothesis presented in Section 3.3 was
tested using the data described in Section 3.1. Using each
individual key performance indicator measure as a feature,
an average filter with a window of size 20 was applied to
reduce the noise interference. Principal Component Analysis
was applied, which resulted in 634 principal components to
be used by the Support Vector Machine classifier for both
the 800 MHz and 1800 MHz frequency bands.
Once again, the optimal hyperparameters were
obtained through grid search, and the test results were
collected after model training. The classification results for
a 50% decision probability threshold are shown in Table 2.
Overall, Gradient Boost was the classifier that led to the
best performance, having the highest Accuracy and Recall
for both frequency bands, but not the best Precision for
the 1800 MHz frequency band. Both models created by the
Extremely Randomized Trees and Random Forest classifiers
had a 100% Precision for the 1800 MHz frequency band,
which meant that Random Forest could result in the best
model, as it had higher Recall.
In order to see if Gradient Boost led to the best
performing model, the Precision-Recall curves were
obtained, and they are presented in Figure 2. Regarding
the 800 MHz frequency band, Gradient Boost showed the
highest average Precision, with a peak of 60% Precision
for 4% Recall. Concerning the 1800 MHz frequency band,
Extremely Randomized Trees presented the best average
Precision, while Gradient Boost achieved higher Precision
for a Recall lower than 5%. Additionally, Random Forest
was not the best performing model, as was seen in Table 2.
The training and testing running times for each
model were obtained. In the 800 MHz frequency band,
Gradient Boost, which led to the best-performing model,
had a testing time below one second and a training time
below 14 seconds. Regarding the 1800 MHz frequency
band, Extremely Randomized Trees, which led to the
best-performing model, was one of the quickest to train
(i.e., 40.3 seconds), but it was one of the slowest to test
(i.e., 1.4 seconds). Nevertheless, its overall performance
was near real time.
Regarding PCI collision detection, Principal
Component Analysis resulted in 634 principal components
for both frequency bands. The test results were collected
with the optimal hyperparameters. The best performing
model was the model obtained from Adaptive Boosting,
as it detected one out of three PCI collisions with 100%
Precision. However, due to the marginally low number of
PCI collisions in the dataset, the results were not sufficiently
signifi cant to draw any conclusions.
Figure 1. The smoothed Precision-Recall curves for statistical-data-based PCI confusion detection.
800 MHz Band 1800 MHz Band
Model Accuracy Precision Recall Accuracy Precision Recall
ERT 85.37% 22.22% 00.71% 93.57% 100% 00.45%
RF 85.63% NaN 00.00% 93.60% 100% 00.90%
SVM 85.63% NaN 00.00% 93.54% NaN 00.00%
AB 85.63% NaN 00.00% 93.54% NaN 00.00%
GB 85.73% 75.00% 01.07% 93.63% 80.00% 01.80%
Table 2. Raw-cell-data PCI confusion classifi cation results.

The Radio Science Bulletin No 364 (March 2018) 17
4.2 Root Sequence Indicator
Collision Detection
4.2.1 Statistical Data Extraction
Classification
The fi rst hypothesis presented in Section 3.3 was
tested using the data described in Section 3.1. Regarding
RSI collision detection, tsfresh yielded 732 and
851 signifi cant extracted features for the 800 MHz and
1800 MHz frequency bands, respectively. In order to
reduce the data dimensionality for applying to the Support
Vector Machine model, Principal Component Analysis was
applied, resulting in 273 and 284 principal components for
the 800 MHz and 1800 MHz frequency bands, respectively.
The optimal hyperparameters were obtained through
grid search, and the test results are presented in Table 3.
The Extremely Randomized Trees model delivered the
highest Precision for both frequency bands, but Gradient
Boost had the highest overall Accuracy and Recall.
In order to gain more insights regarding the
performance of the models, Precision-Recall curves were
obtained and are presented in Figure 3. The Gradient Boost
model was the best for both frequency bands, having a
Precision peak of 85% and an average Precision of 61%.
The abnormal curve behavior of the Adaptive Boosting
model was due to the assignment of several cells with the
same probability values.
The training and testing running times for each model
were obtained. The Gradient Boost model showed testing
times lower than one second; however, it had one of the
highest training times. More specifically, it required 28.4
and 246 seconds of training time for the 800 MHz and
1800 MHz frequency bands, respectively. Nonetheless,
the Gradient Boost model presented higher performance
relative to other obtained models with near-real-time
performance, thus overall being the best model. The learning
curves obtained showed that the performance would not
significantly increase if more data were added to the dataset.
4.2.2 Raw Cell Data Classification
The second hypothesis presented in Section 3.3 was
tested using the data described in Section 3.1. Using each
individual key performance indicator’s measure as a feature,
an average filter with a window of size 20 was applied.
Principal Component Analysis was applied, which yielded in
332 principal components to be used by the Support Vector
Machine classifier for both the 800 MHz and 1800 MHz
frequency bands for RSI collision detection.
The optimal hyperparameters were obtained through
grid search, and the results are presented in Table 4. Once
Figure 2. The smoothed Precision-Recall curves for raw-cell-data-based PCI confusion detection.
800 MHz Band 1800 MHz Band
Model Accuracy Precision Recall Accuracy Precision Recall
ERT 60.32% 100% 00.48% 62.27% 72.97% 02.00%
RF 64.93% 61.30% 32.62% 64.13% 66.94% 12.12%
SVM 60.94% 54.80% 11.55% 61.79% NaN 00.00%
AB 64.02% 56.79% 40.83% 66.37% 59.88% 36.29%
GB 66.87% 61.60% 44.88% 69.39% 63.97% 45.53%
Table 3. Statistical-data-based RSI collision classifi cation results.

18 The Radio Science Bulletin No 364 (March 2018)
more, the Gradient Boost model revealed more Accuracy for
both frequency bands. The Random Forest and Extremely
Randomized Trees models had the highest Precision for
the 800 MHz and 1800 MHz frequency bands.
The Precision-Recall curves were obtained and are
presented in Figure 4. The Gradient Boost model had the
highest average Precision, while the Random Forest and
Extremely Randomized Trees models showed slightly
worse average Precision.
The training and testing running time for each model
were obtained. The Gradient Boost model showed testing
times lower than one second, and the third highest training
times for both frequency bands. More precisely, it took 12.8
and 24.4 seconds to train in the 800 MHz and 1800 MHz
frequency bands, respectively. However, the Gradient
Boost model’s performance was in near real time, and it
was thus overall the best-performing model. The learning
curves obtained showed that the results would improve if
more data were added to the training set, especially for the
Gradient Boost model.
5. Conclusions
This paper tested two hypotheses regarding how
well two distinct LTE network problems could be detected
through supervised techniques with near-real-time
performance.
The PCI confusions were better detected by using
the measurement of each cell’s daily key performance
indicators as an individual feature. This was concluded due
to the result that the average Precision was higher while
testing this hypothesis. Specifically, the average Precisions
reached 31% and 26% for the 800 MHz and 1800 MHz
frequency bands, respectively. No conclusions could be
reached regarding PCI collision detection due to the low
number of PCI collisions in the data set.
The RSI collisions were detected with similar
performance by two proposed hypotheses. However, one
could say that the best detection was obtained by using the
measurement of each cell’s daily key performance indicators
as an individual feature because the learning curves showed
that the results would further improve if more data was added
for the second hypothesis. The best-performing model was
the model that used the Gradient Boost classifier, reaching
average Precisions of 61% and 60% for the 800 MHz and
1800 MHz frequency bands, respectively.
The results showed that supervised techniques for
PCI and RSI confl ict detection are not well suited. This is
because while a cell may have one of these two conflicts,
the confl ict’s impact on the key performance indicators
might be negligible. This fact can be due to several factors,
such as the distance between cells, their azimuth, and the
environment. For future work, an unsupervised approach
for network confl ict detection followed by manual labeling
to be used by a classifier could be investigated. This would
800 MHz Band 1800 MHz Band
Model Accuracy Precision Recall Accuracy Precision Recall
ERT 59.49% 50.00% 00.83% 59.83% 75.00% 00.22%
RF 61.70% 62.64% 13.52% 65.55% 63.86% 33.07%
SVM 60.07% 52.24% 16.61% 59.25% 46.67% 09.14%
AB 64.73% 60.38% 37.60% 64.99% 59.59% 40.32%
GB 66.41% 60.84% 47.92% 66.22% 62.72% 39.52%
Table 4. Raw-cell-data RSI collision classifi cation results.
Figure 3. The smoothed Precision-Recall curves for statistical-data-based RSI collision detection.

The Radio Science Bulletin No 364 (March 2018) 19
result in the labeling of cells with significant differences
between them, which could lead to better classification
results.

Mii_011001

May 24, 2023

wow, what's it talking about? This is so long that I even can't translateit it. XD

GalaxyPen15

May 26, 2023

Mii_011001 hat geschrieben:

wow, what's it talking about? This is so long that I even can't translateit it. XD

Network Jamming detection for mobile networks.

Its not structured very well so translating might be hard.