Data is deemed as a major asset to explore different facts about the entities associated with it. For instance, medical centres, engineering, marketing, sports, warehouses, and cyber-physical systems contain a rich amount of data about different objects or individuals representing various kinds of information [
1]. Data mining specifically focuses on extracting interesting patterns or “knowledge” from data through rigorous analysis. Various real-life applications demand to mine interesting patterns from data [
1]. Currently, the scientific community has been interested in data mining techniques that pertain to pattern mining like frequent pattern mining (FPM) [
2], association rule mining (ARM) [
3], frequent episode mining (FEM) [
4], and sequential pattern mining (SPM) [
5]. These techniques’ prime concern is to mine patterns from real-world applications by harnessing co-occurrence, frequency, and interestingness measures.
Cyber-physical systems (CPS) are known to be as strong at their ability to integrate computation with physical processes. Certain unique data mining problems emerge in the design and operation of CPS-based data sources. Unlike the Internet of Things (IoT), data from CPS often involves interactions between “cyber” and “physical”, which tend to be more closely integrated into CPS, as seen with the databases used in them. A plethora of these techniques produces promising results in most scenarios. However, their applicability cannot be generally considered to be able to handle all kinds of data. The types of databases are often slightly different such as binary databases, quantitative databases, probabilistic databases, stream databases, and fuzzy databases, among others. Mining information from these types of databases is a non-trivial process that requires contemplation of various essential constraints. Most of the existing state-of-the-art algorithms in pattern mining focus on handling binary databases because they have comparatively less complicated structures than other types of databases. Furthermore, existing techniques return a bulk of redundant information in response to the model employed for the pattern mining task. The redundancy makes the knowledge or models larger and more challenging to use or understand by humans.
1.1 Motivation
CPS combines the dynamics of physical processes with those of software and communication. These interwoven concepts provide abstractions and modeling, design, and analysis techniques for the integrated parts [
6]. Fifth-generation mobile networks (5G) are currently being deployed and likely to replace 4G in advanced countries in CPS environments [
7]. Furthermore, we see Sixth-generation mobile networks (6G) on the horizon, which has led to an excess of security and privacy issues concerning data. The higher bit rates available in 5G/6G will enable the use of software-defined networking (SDN) techniques while providing faster data transfer rates. Although advantageous for faster transfer speeds, these faster rates can lead to bigger issues from a cybersecurity point of view [
8]. CPS models are only reliable when 5G/6G networks perform well, especially all the services that are deployed on cloud servers.
With CPS in 5G/6G, higher capacity and very low latency will allow applications designed for IoT networks to connect with data centres. This has and will lead to a fully mobile and connected society which has been the goal all along. From this, the data in databases can be dynamically inserted, deleted, and/or modified. Conventional methods used to handle generic data operate by updating the discovered information by re-scanning the updated database and re-mining the required information. However, this is a theoretical approach as it could increase the time complexity in updating databases for real-time decision making. There is a need for an ideal approach that can intelligently handle the three aforementioned dynamic situations (insertion, deletion, and modification). Thus, a reliable and stable CPS would be better to handle this situation and perform all dynamic operations locally, even if the 5G/6G network is not available. There is a need for an ideal approach that can intelligently handle these three dynamic situations.
Secondly, mobile nodes with personal information can be tracked down and become vulnerable through known attacks like eavesdropping, denial of service, replay, and repudiation [
9,
10]. The higher speed requires a high quality of service (QoS) in terms of execution time, while a higher volume of data is transferred. Data-intensive devices’ deployment to 5G heterogeneous networks requires addressing data privacy and security more seriously. Sensitive data is associated with several applications. For example, considering customer purchases and their location as personal information, this type of data is private and confidential and should be secured and preserved. In this scenario, an edge computing network is required to carry out some sensitive tasks without transmitting the data to central servers, i.e., a cloud server. In academia, the community has not looked into the ability to synthesize information from multiple heterogeneous data sources. Considering the various aspects of the learning environment’s implementation can significantly help in evading the above-stated issues.
Machine learning-based classification models are also used to solve classification tasks in different domains. Random forest is an ensemble classifier that contains the number of decision tree-based models [
11]. A random number of features is divided among each tree based classifier. The voting mechanism is adopted while predicting the unknown class.
K-Nearest Neighbour (KNN) [
12] is known to be the most straightforward classification technique, and it is an instant based learning, also known as a lazy learning classification [
13]. Iterative Dichotomiser 3 (ID3) uses the information to select the attribute and classify the current subset of instances. For each level node, information gain is calculated recursively [
14]. Another ID3 based approach is C4.5 [
15], which uses the information gain and gain ratio to select attributes. The main advantage of C4.5 over ID3 is that it handles both continuous and missing attribute values.
In transactional data, a set of distinct and discrete items exist. The items represent different patterns in many different domains and applications. For example, in supermarket basket analysis, items represent the purchases of the products. In health data, items represent symptoms diagnosed during the patient’s admission. Different applications have different orientations and concept of the items. Extracting useful information from the analysis of the item is a challenging task. A common solution in data mining is the usage of the frequency of the items as features. The threshold values are being used as criteria for pattern extraction. The bitmap representation is used to represent items. If the item exists, then the binary value is assigned to it; otherwise, it will set it zero. The frequent item(sets) are represented with huge vector space since the number of items(sets) is often large, especially when data is collected from CPS environments. It is resulting in the curse of high dimensionality issues and data sparsity problems.
Researchers solve the curse of dimensionality and data sparsity problems by using measures [
16]. However, the above approaches may have three limitations. Firstly, they consider the representation of transactional data for particular tasks, i.e., transaction classification. As a result, they cannot transfer the patterns or learning mechanisms to other different tasks. Secondly, the methods require the number of instances structured in the database. However, real-world applications are often dynamic, and some domain applications, such as stream scenarios, do not allow the re-scanning of databases. Third, the models do not tackle the problem of extracting patterns without using actual data. The databases often collaborate in centralized structures. This structuring often results in overhead as it requires time-consuming approvals because of data privacy and ethical concerns associated with data sharing of personal records. Even when these challenges are being addressed, data sets are valuable for organizations, so they prefer not to share full data sets. Furthermore, the datasets of mobile networks or IoT networks are often very large and often expensive to acquire central hosting storage. Consequently, a federated learning approach [
17] can tackle the above issues, where only model weights are shared across the network without the raw data information. In this paper, we assume at least one cycle of federated learning is being achieved and models share their weights to the server. If the network is down, then the local model provides an embedding and uses it for the prediction until the network is online again.