Skip to main content

2021 | Book

Big Data Technologies and Applications

10th EAI International Conference, BDTA 2020, and 13th EAI International Conference on Wireless Internet, WiCON 2020, Virtual Event, December 11, 2020, Proceedings


About this book

This book constitutes the refereed post-conference proceedings of the 10th International Conference on Big Data Technologies and Applications, BDTA 2020, and the 13th International Conference on Wireless Internet, WiCON 2020, held in December 2020. Due to COVID-19 pandemic the conference was held virtually.

The 9 full papers of BDTA 2020 were selected from 22 submissions and present all big data technologies, such as storage, search and management.
WiCON 2020 received 18 paper submissions and after the reviewing process 5 papers were accepted. The main topics include wireless and communicating networks, wireless communication security, green wireless network architectures and IoT based applications.

Table of Contents


BDTA 2020

Constructing Knowledge Graph for Prognostics and Health Management of On-board Train Control System Based on Big Data and XGBoost
Train control system plays a significant role in safe and efficient operation of the railway transport system. In order to enhance the system capability and cost efficiency from a full life cycle perspective, the establishment of a Condition-based Maintenance (CBM) scheme will be beneficial to both the currently in use and next generation train control systems. Due to the complexity of the fault mechanism of on-board train control system, a data-driven method is of great necessity to enable the Prognostics and Health Management (PHM) for the equipments in field operation. In this paper, we propose a big data platform to realize the storage, management and processing of historical field data from on-board train control equipments. Specifically, we focus on constructing the Knowledge Graph (KG) of typical faults. The Extreme Gradient Boosting (XGBoost) method is adopted to build big-data-enabled training models, which reveal the distribution of the feature importance and quantitatively evaluate the fault correlation of all related features. The presented scheme is demonstrated by a big data platform with incremental field data sets from railway operation process. Case study results show that this scheme can derive knowledge graph of specific system fault and reveal the relevance of features effectively.
Jiang Liu, Bai-gen Cai, Zhong-bin Guo, Xiao-lin Zhao
Early Detecting the At-risk Students in Online Courses Based on Their Behavior Sequences
Online learning has developed rapidly, but the participation of learners is very low. So it is of great significance to construct a prediction model of learning results, to identify students at risk in time and accurately. We select nine online learning behaviors from one course in Moodle, take one week as the basic unit and 5 weeks as the time node of learning behavior, and the aggregate data and sequence data of the first 5 weeks, the first 10 weeks, the first 15 weeks, the first 20 weeks, the first 25 weeks, the first 30 weeks, the first 35 weeks and the first 39 weeks are formed. Eight classic machine learning methods, i.e. Logistic Regression (LR), Naive Bayes (NB), Radom Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Iterative Dichotomiser3 (ID3), Classification and Regression Trees (CART), and Neural Network (NN), are used to predict the learning results in different time nodes based on aggregate data and sequence data. The experimental results show that sequence data is more effective than aggregate data to predict learning results. The prediction AUC of RF model on sequence data is 0.77 at the lowest and 0.83 at the highest, the prediction AUC of CART model on sequence data is 0.70 at the lowest and 0.83 at the highest, which are the best models of the eight classic prediction models. Then Radom Forest (RF) model, Classification and Regression Trees (CART) model, recurrent neural network (RNN) model and long short term memory (LSTM) model are used to predict learning results on sequence data; the experimental results show that long short term memory (LSTM) is a model with the highest value of AUC and stable growth based on sequence data, and it is the best model of all models for predicting learning results.
Shuai Yuan, Huan Huang, Tingting He, Rui Hou
Do College Students Adapt to Personal Learning Environment (PLE)? A Single-Group Study
Home-based online learning is a typical application of personal learning environment. Understanding the adaptability and characteristics of college students in the personal learning environment (PLE) can effectively tap the potential of online courses and provide valuable references for learners' online and lifelong learning. In this single-group study, 80 college students received a 90-min self-regulated learning training. In pre- and post-class evaluations, media multi-tasking self-efficacy, perceived attention problems, self-regulation strategies and learning satisfaction are used as key variables in online learning to assess their personal learning environment adaptability and characteristics. Using descriptive statistics and one-dimensional intra-group variance to analyze the data, it was found that: Learners have a moderate degree of attention deficit in their personal learning environment, which is manifested in three aspects: perceived attention discontinuity, lingering thought, social media notification.; Under simple training or natural conditions, students have poor adaptability in the personal learning environment, and their behavior perception and behavior adjustment levels have improved, but they have not yet reached expectations; Participation in online learning has significantly increased the application of learners' self-regulation strategies, especially the application of behavior strategies.
Changsheng Chen, Xiangzeng Meng, Junxiao Liu, Zhi Liu
A Big Data Intelligence Marketplace and Secure Analytics Experimentation Platform for the Aviation Industry
Over the last years, the impacts of the evolution of information integration, increased automation and new forms of information management are also evident in the aviation industry that is disrupted also by the latest advances in sensor technologies, IoT devices and cyber-physical systems and their adoption in aircrafts and other aviation-related products or services. The unprecedented volume, diversity and richness of aviation data that can be acquired, generated, stored, and managed provides unique capabilities for the aviation-related industries and pertains value that remains to be unlocked with the adoption of the innovative Big Data Analytics technologies. The big data technologies are focused on the data acquisition, the data storage and the data analytics phases of the big data lifecycle by employing a series of innovative techniques and tools that are constantly evolving with additional sophisticated features, while also new techniques and tools are frequently introduced as a result of the undergoing research activities. Nevertheless, despite the large efforts and investments on research and innovation, the Big Data technologies introduce also a number of challenges to its adopters. Besides the effective storage and access to the underlying big data, efficient data integration and data interoperability should be considered, while at the same time multiple data sources should be effectively combined by performing data exchange and data sharing between the different stakeholders that own the respective data. However, this reveals additional challenges related to the crucial preservation of the information security of the collected data, the trusted and secure data exchange and data sharing, as well as the robust access control on top of these data. The current paper aims to introduce the ICARUS big data-enabled platform that aims provide a multi-sided platform that offers a novel aviation data and intelligence marketplace accompanied by a trusted and secure “sandboxed” analytics workspace. It holistically handles the complete big data lifecycle from the data collection, data curation and data exploration to the data integration and data analysis of data originating from heterogeneous data sources with different velocity, variety and volume in a trusted and secure manner.
Dimitrios Miltiadou, Stamatis Pitsios, Dimitrios Spyropoulos, Dimitrios Alexandrou, Fenareti Lampathaki, Domenico Messina, Konstantinos Perakis
A Multi-valued Logic Assessment of Organizational Performance via Workforce Social Networking
Social Media have changed the conditions and rules of Social Networking (SNet) where it comes from people intermingling with each other, i.e., SNet is to be understood as a process that works on the principle of many-to-many; any individual can create and share content. It is intended to explore explore the complex dynamics between SNet, Logic Programming (LP), and the Laws of Thermodynamic (LoT) in terms of entropy by drawing attention to how Multi-Value Logic (MVL) intertwines with SNet, LP and LoT, i.e., its norms, strategies, mechanisms, and methods for problem solving that underpin its dynamics when looks to programmability, connectivity, and organizational performance. Indeed, one’s focus is on the tactics and strategies of MVL to evaluate the issues under which social practices unfold and to assess their impact on organizational performance.
José Neves, Florentino Fdez-Riverola, Vitor Alves, Filipa Ferraz, Lia Sousa, António Costa, Jorge Ribeiro, Henrique Vicente
Research on the Sharing and Application of TCM Digital Resources
With the vigorous development of online teaching and online learning, it has further increased the demand for digital resources, and further enhanced the feasibility of digital education, the necessity of digital resource construction and the importance of digital resource sharing in the information age. In this study, the status quo of TCM digital resources was studied from the aspects of literature research and resource construction, and a questionnaire survey was conducted among teachers and students in the major of TCM acupuncture in a TCM university. On this basis, suggestions on the application of digital resources in TCM acupuncture courses were proposed.
Min Hu, Hao Li
Statistical Research on Macroeconomic Big Data: Using a Bayesian Stochastic Volatility Model
The alternative variation of variance in Stochastic Volatility (SV) models provides a big data modelling solution that is more suitable for the fluctuation process in macroeconomics for de-scribing unobservable fluctuation features. The estimation method based on Monte Carlo simula-tion shows unique advantages in dealing with high-dimensional integration problems. The statis-tical research on macroeconomic big data based on Bayesian stochastic volatility model builds on the Markov Chain Monte Carlo estimation. The critical values of the statistics can be defined exactly, which is one of the drawbacks of traditional statistics. Most importantly, the model pro-vides an effective analysis tool for the expected variable generation behaviour caused by macroe-conomic big data statistics.
Minglei Shan
Introducing and Benchmarking a One-Shot Learning Gesture Recognition Dataset
Deep learning techniques have been widely and successfully applied, over the last five years, to recognize the gestures and activities performed by users wearing electronic devices. However, the collected datasets are built in an old fashioned way, mostly comprised of subjects that perform many times few different gestures/activities. This paper addresses the lack of a wearable gesture recognition dataset for exploring one-shot learning techniques. The current dataset consists of 46 gestures performed by 35 subjects, wearing a smartwatch equipped with 3 motion sensors and is publicly available. Moreover, 3 one-shot learning classification approaches are benchmarked on the dataset, exploiting two different deep learning classifiers. The results of the benchmark depict the difficulty of the one-shot learning task, exposing new challenges for wearable gesture/activity recognition.
Panagiotis Kasnesis, Christos Chatzigeorgiou, Charalampos Z. Patrikakis, Maria Rangoussi
NetFlow Datasets for Machine Learning-Based Network Intrusion Detection Systems
Machine Learning (ML)-based Network Intrusion Detection Systems (NIDSs) have become a promising tool to protect networks against cyberattacks. A wide range of datasets are publicly available and have been used for the development and evaluation of a large number of ML-based NIDS in the research community. However, since these NIDS datasets have very different feature sets, it is currently very difficult to reliably compare ML models across different datasets, and hence if they generalise to different network environments and attack scenarios. The limited ability to evaluate ML-based NIDSs has led to a gap between the extensive academic research conducted and the actual practical deployments in the real-world networks. This paper addresses this limitation, by providing five NIDS datasets with a common, practically relevant feature set, based on NetFlow. These datasets are generated from the following four existing benchmark NIDS datasets: UNSW-NB15, BoT-IoT, ToN-IoT, and CSE-CIC-IDS2018. We have used the raw packet capture files of these datasets, and converted them to the NetFlow format, with a common feature set. The benefits of using NetFlow as a common format include its practical relevance, its wide deployment in production networks, and its scaling properties. The generated NetFlow datasets presented in this paper have been labelled for both binary- and multi-class traffic and attack classification experiments, and we have made them available for to the research community [1]. As a use-case and application scenario, the paper presents an evaluation of an Extra Trees ensemble classifier across these datasets.
Mohanad Sarhan, Siamak Layeghy, Nour Moustafa, Marius Portmann

WiCON 2020

Performance Evaluation of Energy Detection, Matched Filtering and KNN Under Different Noise Models
Due to the broadcast nature of radio transmission, both authorized and unauthorized users can access the network, which leads to the increasingly prominent security problems of wireless network. At the same time, it is more difficult to detect and identify users in wireless network environment due to the influence of noise. In this paper, the performance of energy detection (ED), matched filtering (MF) and K-nearest neighbor algorithm (KNN) are analyzed under different noise and uncertain noise separately. The Gaussian noise, α-stable distribution noise and Laplace distribution noise models are simulated respectively under the different uncertainty of noise when the false alarm probability is 0.01. The results show that the performance of the detectors is significantly affected by different noise models. In any case, the detection probability of KNN algorithm is the highest; the performance of MF is much better than ED under different noise models; KNN is not sensitive to noise uncertainty; MF has better performance on noise uncertainty which makes ED performance decline fleetly.
Xiaoyan Wang, Jingjing Yang, Tengye Yu, Rui Li, Ming Huang
Hybrid Deep-Readout Echo State Network and Support Vector Machine with Feature Selection for Human Activity Recognition
Developing sophisticated automated systems for assisting numerous humans such as patients and elder people is a promising future direction. Such smart systems are based on recognizing Activities of Daily Living (ADLs) for providing a suitable decision. Activity recognition systems are currently employed in developing many smart technologies (e.g., smart mobile phone) and their uses have been increased dramatically with availability of Internet of Things (IoT) technology. Numerous machine learning techniques are presented in literature for improving performance of activity recognition. Whereas, some techniques have not been sufficiently exploited with this research direction. In this paper, we shed the light on this issue by presenting a technique based on employing Echo State Network (ESN) for human activity recognition. The presented technique is based on combining ESN with Support Vector Machine (SVM) for improving performance of activity recognition. We also applied feature selection method to the collected data to decrease time complexity and increase the performance. Many experiments are conducted in this work to evaluate performance of the presented technique with human activity recognition. Experiment results have shown that the presented technique provides remarkable performance.
Shadi Abudalfa, Kevin Bouchard
Research on User Privacy Security of China’s Top Ten Online Game Platforms
The privacy agreement presented to online game users is the basic guarantee of the running of an online game. An online game platform can have access to users’ private information by setting various mandatory clauses. This paper takes the ten most popular online game platforms in China in recent years as examples, using documentary analysis and quantitative analysis to analyze their privacy clauses. The research results show that there are loopholes in protection of users’ private information by online platforms that have gained access and rights to use them. Based on this, it is conducive to the protection of users’ private information through improving information security protection system of online game platforms, adding the option for access denial of privacy information in the process of user registration, and mandatorily prolonging the time assigned for users to read the privacy agreements.
Lan-Yu Cui, Mi-Qian Su, Yu-Chen Wang, Zu -Mei Mo, Xiao-Yue Liang, Jian He, Xiu-Wen Ye
Spectrum Sensing and Prediction for 5G Radio
In future wireless networks, it is crucial to find a way to precisely evaluate the degree of spectrum occupation and the exact parameters of free spectrum band at a given moment. This approach enables a secondary user (SU) to dynamically access the spectrum without interfering primary user’s (PU) transmission. The known methods of signal detection or spectrum sensing (SS) enable making decision on spectrum occupancy by SU. The machine learning (ML), especially deep learning (DL) algorithms have already proved their ability to improve classic SS methods. However, SS can be insufficient to use the free spectrum efficiently. As an answer to this issue, the prediction of future spectrum state has been introduced. In this paper, three DL algorithms, namely NN, RNN and CNN have been proposed to accurately predict the 5G spectrum occupation in the time and frequency domain with the accuracy of a single resource block (RB). The results have been obtained for two different datasets: the 5G downlink signal with representation of daily traffic fluctuations and the sensor-network uplink signal characteristic for IoT. The obtained results prove DL algorithms usefulness for spectrum occupancy prediction and show significant improvement in detection and prediction for both low signal-to-noise ratio (SNR) and for high SNR compared with reference detection/prediction method discussed in the paper.
Małgorzata Wasilewska, Hanna Bogucka, Adrian Kliks
Towards Preventing Neighborhood Attacks: Proposal of a New Anonymization’s Approach for Social Networks Data
Anonymization is a crucial process to ensure that published social network data does not reveal sensitive user information. Several anonymization approaches for databases have been adopted to anonymize social network data and prevent the various possible attacks on these networks. In this paper, we will identify an important type of attack on privacy in social networks: “neighborhood attacks”. But it is observed that the existing anonymization methods can cause significant errors in certain tasks of analysis of structural properties such as the distance between certain pairs of nodes, the average distance measure “APL”, the diameter, the radius, etc. This paper aims at proposing a new approach of anonymization for preventing attacks from neighbors while preserving as much as possible the social distance on which other structural properties are based, notably APL. The approach is based on the principle of adding links to have isomorphic neighborhoods, protect published data from neighborhood attacks and preserve utility on the anonymized social graph. Our various experimental results on real and synthetic data show that the algorithm that combines the addition of false nodes with the addition of links, allows to obtain better results compared to the one based only on the addition of links. They also indicate that our algorithm preserves average distances from the existing algorithm because we add edges between the closest nodes.
Requi Djomo, Thomas Djotio Ndie
Big Data Technologies and Applications
Dr. Zeng Deze
Huan Huang
Rui Hou
Seungmin Rho
Dr. Naveen Chilamkurti
Copyright Year
Electronic ISBN
Print ISBN

Premium Partner