Skip to main content
Top

2023 | Book

Big Data Analytics in Astronomy, Science, and Engineering

10th International Conference on Big Data Analytics, BDA 2022, Aizu, Japan, December 5–7, 2022, Proceedings

insite
SEARCH

About this book

This book constitutes the proceedings of the 10th International Conference on Big Data Analytics, BDA 2022, which took place in a hybrid mode during December 2022 in Aizu, Japan.

The 14 full papers included in this volume were carefully reviewed and selected from 70 submissions. They were organized in topical sections as follows: big data analytics, networking, social media, search, information extraction, image processing and analysis, spatial, text, mobile and graph data analysis, machine learning, and healthcare.

Table of Contents

Frontmatter

Data Science: Systems

Frontmatter
Ontology Augmented Data Lake System for Policy Support
Abstract
Analytics of Big Data in the absence of an accompanying framework of metadata can be a quite daunting task. While it is true that statistical algorithms can do large-scale analyses on diverse data with little support from metadata, using such methods on widely dispersed, extremely diverse, and dynamic data may not necessarily produce trustworthy findings. One such task is identifying the impact of indicators for various Sustainable Development Goals (SDGs). One of the methods to analyze impact is by developing a Bayesian network for the policymaker to make informed decisions under uncertainty. It is of key interest to policy-makers worldwide to rely on such models to decide the new policies of a state or a country (https://​sdgs.​un.​org/​2030agenda). The accuracy of the models can be improved by considering enriched data – often done by incorporating pertinent data from multiple sources. However, due to the challenges associated with volume, variety, veracity, and the structure of the data, traditional data lake systems fall short of identifying information that is syntactically diverse yet semantically connected. In this paper, we propose a Data Lake (DL) framework that targets ingesting & processing of data like any traditional DL, and in addition, is capable of performing data retrieval for applications such as Policy Support Systems (where the selection of data greatly affect the output interpretations) by using ontologies as the intermediary. We discuss the proof of concept for the proposed system and the preliminary results (IIITB Data Lake project Website link: http://​cads.​iiitb.​ac.​in/​wordpress/​) based on the data collected from the agriculture department of the Government of Karnataka (GoK).
Apurva Kulkarni, Pooja Bassin, Niharika Sri Parasa, Vinu E. Venugopal, Srinath Srinivasa, Chandrashekar Ramanathan
Explorations in Active Learning Applied to Image Classification
Abstract
While development of very large models is the core of today’s artificial intelligence, very often the cost of model training is being raised. In this context, active learning is pointed to as a method to maximize model quality, while minimizing the amount of resources needed to train it. The aim of this contribution is to systematically compare performance of active learning applied to the image classification task for three datasets.
Adriana Klimczak, Marcel Wenka, Maria Ganzha, Marcin Paprzycki
Discovery of Small Signals in Big Backgrounds
Abstract
Discovery of new phenomena typically consists of finding a small signal of data in processes which naturally produce similar data with much larger rates; these latter “events” we call background. The statistical problem, or data analytics problem, lies in discerning the signal in the large background i.e., the proverbial needle in a haystack. We see here how the analysis benefits from high dimensionality via successively reducing the background as we go higher in number of dimensions.
Milind V. Purohit
Neuro-Symbolic Regression with Applications
Abstract
Discovering symbolic models is growing in popularity with the increasing interest in interpretable machine learning. Symbolic regression is the task of learning an analytical form of underlying models in data. Two machine learning techniques have proven their effectiveness: reinforce trick and transformer neural network. This paper discusses in detail the two techniques and presents the application of symbolic regression on a simulated data set that describes a high-energy physics process.
Nour Makke, Sanjay Chawla

Data Science: Architectures

Frontmatter
Introducing Federated Learning into Internet of Things Ecosystems – Maintaining Cooperation Between Competing Parties
Abstract
In practical realizations of a Federated Learning ecosystems, the parties cooperating during the training process, and that later use the trained/global model may consist of competing institutions. This can result in incentives for malicious behavior, which can infringe on the safety and data privacy of other participants. Additionally, even in cases devoid of foul play, the format of the data stored locally, and the equipment available for training, may differ between participating institutions. This necessitates creation of a flexible and adaptable preprocessing pipeline, including a comprehensive registration and data preparation process. Among others, it should identify the affiliation of the joining device(s), maintain appropriate data privacy mechanisms, and compensate for the heterogeneity of the devices that are to participate in model training. In this context, the practical aspects of deploying federated learning solutions, in real-life production environments, are discussed.
Karolina Bogacka, Anastasiya Danilenka, Katarzyna Wasielewska-Michniewska, Marcin Paprzycki, Maria Ganzha, Eduardo Garro, Lambis Tassakos
Blockchain Based B-Health Prototype for Secure Healthcare Transactions
Abstract
The rapid advances in crypto wallet are re-defining privacy around transactions. A Crypto wallet comprises of software containing private and public keys and uses Blockchain to send and receive currency. Since the wallet will be based on the Blockchain, every transaction will be recorded, and every transaction will be stored. The privacy concerns have emerged out interaction between the normal transactions being made and the transactions made using blockchain in many fields like healthcare, defence etc. For the check on transactional details even the currency is not stored at one location; instead, they all exist as a transaction records on the Blockchain. In creating a Crypto Digital Wallet, we will solve traditional wallet problems. As these wallets store private and public keys, a user is facilitated with various operations such as sending or receiving coins, Portfolio balance, and crypto currency trading. This also ensures the user’s privacy by using a hexadecimal address of the wallet. However, the currency’s address to be exchanged differs from one service provider to another. Therefore, a Blockchain-based wallet provides all the necessary features for safe and secure transfers and exchange of funds between different healthcare consumers. It will also allow easy exchange of currency between different countries when patients are treated in different countries the issue of currency exchange will also be conquered. The renowned papers from various journals are studied and then it is discovered that there is no crypto wallet that can provide personalized point of service to the patients. In this paper, we have tried creating a “Crypto wallet based on Blockchain,” to manage healthcare data the issues of security and data breach will be eradicated.
Puneet Goswami, Victor Hugo C. de Albuquerque, Lakshita Aggarwal
A Blockchain-Based Approach for Audit Management of Electronic Health Records
Abstract
The maintenance of proper health records is essential to patient health care. Electronic Health Records (EHR) are now replacing traditional Manual Health Records. Audit logs or Audit Trails are a record of events and changes done in a system. Majority of hospitals are required to maintain an audit trail of each and every EHR. Currently, the audit trail is stored in relational databases, which can be easily modified and trust can be lost in the process. Also, third-party audit trails are inefficient, costly, and time-consuming. Replication on Blockchain would be a viable method for securing audit trails so that they are secure, transparent, and immutable without the need for third-party intervention. In this manuscript, we have proposed an Audit Management System where immutable audit trail of EHR can be generated on Blockchain. In this system, a physician or other medical authority having access to this audit trail can easily verify all the consultations, procedures and prescriptions given to the patient in a chronological manner.
Rashmi P. Sarode, Yutaka Watanobe, Subhash Bhalla
Sentinel: An Enhanced Multimodal Biometric Access Control System
Abstract
Every place be it a household or organization, big or small, like banks have something that needs to be secured to ensure efficient operations and management. Security is always a concern as what is being protected is valuable. Security systems based on singular or multiple biometrics such as face, voice, iris, fingerprint and palm along with things being carried in person such as RFID card or security key(s) are used along with or instead of pin, password based existing lock systems is mostly used because of the uniqueness and added layer of security provided by the aforementioned features. But the implementation of these features alone is not sufficient to thwart any malicious actions to gain access to a secure location, due to rise in technology capable of beating/bypassing said security systems. Thus, this paper proposes a robust security system that will be take care of security requirements of any location that might contain something valuable and to be retrofitted with the problems prevailing in the present systems. The proposed system is capable of detecting & recognizing a person’s face, their emotion based on facial expression, the liveliness factor of their face to determine physical presence, identifying the speaker along with a word/phrase in their speech and detecting factors in the surrounding environment that may threaten a user. The system is designed in a way such that anyone who wants to enter/access a secure location has to pass through all of these layers like password, facial recognition, facial emotion recognition, facial liveliness recognition, speaker recognition, speaker phrase detection, and environmental threat detection etc. of security working in unison, none of which can be bypassed easily. All the sensors for detecting, identifying and recognizing said biometric features are securely connected to a singular security device to ensure success of this goal.
N. Krishna Khanth, Sharad Jain, Suman Madan

Big Data Analytics in Healthcare Support Systems

Frontmatter
A Programmatic Solution to Stop Heartbleed Bug Attack
Abstract
A flaw was found in the Open SSL cryptography library in April 2014, known as the Heartbleed vulnerability that was implemented in the Transport Layer Security and Secure Socket Layer Protocols. This bug allowed the attacker to steal sensitive data from the victim’s memory servers. This vulnerability was present on many web servers and major sites, including Yahoo. Many servers could have a significant loss due to this. This research paper has discussed the Heartbleed vulnerability and proposed one solution to fix this for developer security. The Objective is to find a programmatic solution for heartbleed vulnerability to prevent the victim from losses. This proposed work has a major impact on authenticity and security while using open-source projects. This research paper will present a coding way of checking payload length before transferring the data to fix this bug.
Urvashi Chugh, Amit Chugh, Prabhakar Agarwal, S. Pratap Singh
A Short Review on Cataract Detection and Classification Approaches Using Distinct Ophthalmic Imaging Modalities
Abstract
A cataract is one of the leading causes of visual impairment worldwide compared with other major age-related eye diseases, including blindness, such as diabetic retinopathy, age-related macular degeneration, trachoma, and glaucoma. Cloudiness in the lens of an eye leads to an increasingly blurred vision where genetics and aging are the leading cause of cataracts. In recent years, various researchers have shown an interest in developing state-of-the-art machine learning and deep learning techniques-based methods that work on distinct ophthalmic imaging modalities aiming to detect and prevent cataracts in the early stage. This survey highlights the advances in machine learning and deep learning state-of-the-art algorithms and techniques applied to cataract detection and classification using slit lamps, fundus retinal images, and digital camera images. In addition, this survey also provides insights into previous works along with the merits and demerits.
Aakash Garg, Jay Kant Pratap Singh Yadav, Sunita Yadav
Automatic Identification of Cataract by Analyzing Fundus Images Using VGG19 Model
Abstract
Nowadays, cataracts are one of the prevalent eye conditions that may lead to vision loss. Precise and prompt recognition of the cataract is the best method to prevent/treat it in early stages. Artificial intelligence-based cataract detection systems have been considered in multiple studies. There, different deep learning algorithms have been used to recognize the disease. In this context, it has been established that the training time of the VGG19 model is very low, when compared to other Convolutional Neural Networks. Hence, in this research, the VGG19 model, for automatic cataract identification in fundus images, has been proposed for healthy lives. The performance of the VGG19 is explored with four different optimizers, i.e. Adam, AdaDelta, SGD and AdaGrad and tested on a collection of 5000 fundus images. Overall, the best experimental results reached 98% precision of classification.
Rakesh Kumar, Vatsala Anand, Sheifali Gupta, Maria Ganzha, Marcin Paprzycki
Sensors Based Advanced Bluetooth Pulse Oximeter System
Abstract
Arduino based Bluetooth-equipped pulse oximeter is a measurement device that uses near infrared spectroscopy to measure blood pressure, and is designed with the HC-05 Bluetooth module. It can be employed using a smart mobile application or hardware. The oximeter uses a I2C 16 * 2 display module, which is a parallel data converter chip that works seamlessly with the LCD display module. This chip can convert the I2C data into parallel data, which is required by the LCD display. The portable terminal uses a digital algorithm to determine the value of the oxygen saturation and the pulse rate, and it does so through the smart mobile app interface. The designed oximeter can help doctors to keep a time to time check on the patient’s pulse and Spo2 level from anywhere in the hospital via their mobile phones, which would especially be helpful to keep the doctors, nurses distant from the patients during any Pandemic. The paper presents a novel model of Arduino based Bluetooth Pulse Oximeter using sensors and Bluetooth module with its applications in various sectors.
Jaspinder Kaur, Ajay Kumar Sharma, Divya Punia

Information Interchange of Web Data Resources

Frontmatter
Saral Anuyojan: An Interactive Querying Interface for EHR
Abstract
Maintaining a lifelong medical record is impossible without proper standards. For an individual, different records from different sources must be brought meaningfully together for them to be of some use. To achieve this, we need a set of pre-defined standards for information capture, storage, retrieval, exchange, and analytics. It has been found that electronic health records can enhance the quality and safety of care while improving the management of health information and clinical data. While electronic health records have so much potential, it is difficult to use them. It requires queries written in AQL to interact with the EHR database. Writing AQL queries is a complex as well as a tedious task. An interface is needed that can speed up the querying process thereby enhancing efficiency. Considering the importance of using electronic health records and the difficulty of using them, we aim to design Saral Anuyojan which is a system that consists of components such as user interface, query translator, and interface manager. The user interface takes input from the user and then the query translator converts it into AQL queries for further processing. Then the AQL query is sent to the backend (EHRbase) which stores it in a standard format. Finally, the output is returned as a visual interpretation on the user interface. Requirements of the clinicians and patients are limited (view and update) so they can be implemented without much complexity. Since, EHR has a complex structure difficult to use by a non-technical user, this problem is re-solved in our approach by making an easy user user-interface we are bypassing the long and complex approach of learning AQL. The proposed system Saral Anuyojan helps in improving the management and efficiency of the healthcare sector.
Kanika Soni, Shelly Sachdeva, Arpit Goyal, Aryan Gupta, Divyanshu Bose, Subhash Bhalla
Integrated Transmission and Distribution Modelling Using Multi-agent Based Framework
Abstract
Technological changes that enable the widespread adoption of renewable energy-based distributed generation and shift in the source of transportation sectors are introducing nascent challenges and opportunities in power generation, transmission, and distribution systems. Conventionally, the transmission and distribution systems were analyzed in a segregated manner under the assumption that the aggregate load subtended by the distribution systems at the transmission systems would remain constant, but with changing technologies and deployment of large-scale renewable sources of generation along with the increase in electric vehicle charging stations made the load subtended at the transmission level by distribution system to become uncertain with a probability of reverse power-flow. Therefore, an adequate method for analyzing the integrated transmission and distribution systems is to be evolved; in this article, we propose a multi-agent-based system framework for analyzing the integrated transmission and distribution system. The developed framework would be used to analyze the IEEE 9 bus test system at the transmission level and the IEEE 13 Bus test feeder at the distribution level.
Devesh Shukla, S. P. Singh
Incremental SVD-Based Hybrid Movie Recommendation to Improve Content Delivery Over CDN
Abstract
With the tremendous growth in the number of users watching on-demand movies over the Internet, Content Delivery Network (CDN) is used to provide a more efficient network and improve user experiences. CDN accelerates the response to end users after identifying the popularity of movie content. Nowadays, massive content has created difficulties in accurately predicting each movie's popularity. A recommendation system uses filtering tools that recommend content according to the popularity among the users. Currently, recommendation systems play a vital role in identifying the relevant content from massive data sources and providing options to users to select content according to interest. To deal with related content, we analyze different algorithms utilized in popular recommendation systems and develop a hybrid system. We are using a hybrid approach by merging basic content based with Incremental SVD-based filtering to enhance the effect of the recommendation system for movie content to the end users over CDN.
Rohit Kumar Gupta, Yugam Shukla, Ankit Mundra, Ritu Dewan

Business Analytics

Frontmatter
Improving Emotional Confusions in SNS Sentiment Analysis by Partial Redistribution of BERT Discrimination Results
Abstract
Most of previous research works of sentiment analysis on SNS mainly focused on polarity analysis to probe into user tendencies. However, human emotions are complex and changeable. It is difficult to use the results of traditional polarity analysis in the real-time application services. Although finer-grained sentiment analysis may provide more detailed results, it has the problem of ambiguity in the definition of features between emotions. In this study, we propose a partial redistribution method based on BERT to tackle this problem. It improves emotional confusion in sentiment analysis through the confusion matrix, and further uses binary classification models to re-train the data of the confused emotional group through the redistribution process. In addition, the model makes it possible to re-extract and define features for specific emotions. Finally, F1 score is used to judge whether each feature correction process exerts positive impact on the model. Experimental results demonstrate that our proposed approach is effective in improving emotional confusion issues in SNS sentiment analysis.
Yenjou Wang, Qun Jin
A Scientific Perspective of Agnihotra to Curtail Pollutants in the Air
Abstract
Air Pollution is harmful to human health. Poor air quality can lead to life-threatening diseases. We have to defend ourselves from the viruses entering our immune system. Agnihotra/Yagya is a gift to humanity from ancient Vedic Science. It refers to a ritual performed through the fire prepared in a copper pyramid, along with the chanting of Sanskrit mantras. We aim to analyze the effect of Yagya on air pollutants. In our research, we propose a novel system using the Internet of Things (IoT), cloud technologies followed by data analytics to measure the concentration of pollutants in the air. The data will be gathered from IoT-based sensing units and stored on a cloud system. The analytical model is automated through machine learning and data visualization techniques. The pollutant concentration is measured before and after Agnihotra. The amalgamation of Vedic science with computer technology may lead to sustainable development by curtailing pollutants in the air.
Shailee Bhatia, Shelly Sachdeva, Puneet Goswami
Policy Driven Epidemiological (PDE) Model for Prediction of COVID-19 in India
Abstract
The fast spread of COVID-19 has made it a global issue. Despite various efforts, proper forecasting of COVID-19 spread is still in question. Government lockdown policies play a critical role in controlling the spread of coronavirus. However, existing prediction models have ignored lockdown policies and only focused on other features such as age, sex ratio, travel history, daily cases etc. This work proposes a Policy Driven Epidemiological (PDE) Model with Temporal, Structural, Profile, Policy and Interaction Features to forecast COVID-19 in India and its 6 states. PDE model integrates two models: Susceptible-Infected-Recovered-Deceased (SIRD) and Topical affinity propagation (TAP) model to predict the infection spread within a network for a given set of infected users. The performance of PDE model is assessed with respect to linear regression model, three epidemiological models (Susceptible-Infectious-Recovered-Model (SIR), Susceptible-Exposed-Infectious-Recovered-Model (SEIR) and SIRD) and two diffusion models (Time Constant Cascade Model and Time Decay Feature Cascade Model). Experimental evaluation for India and six Indian states with respect to different government policies from 15th June to 30th June, i.e., Maharashtra, Gujarat, Tamil Nadu, Delhi, Rajasthan and Uttar Pradesh divulge that prediction accuracy of PDE model is in close proximity with the real time for the considered time frame. Results illustrate that PDE model predicted the COVID-19 cases up to 94% accuracy and reduced the Normalize Mean Squared Error (NMSE) up to 50%, 35% and 42% with respect to linear regression, epidemiological models and diffusion models, respectively.
Sakshi Gupta, Shikha Mehta
Backmatter
Metadata
Title
Big Data Analytics in Astronomy, Science, and Engineering
Editors
Shelly Sachdeva
Yutaka Watanobe
Subhash Bhalla
Copyright Year
2023
Electronic ISBN
978-3-031-28350-5
Print ISBN
978-3-031-28349-9
DOI
https://doi.org/10.1007/978-3-031-28350-5

Premium Partner