Skip to main content
Top

2020 | Book

Advances in Computing and Data Sciences

4th International Conference, ICACDS 2020, Valletta, Malta, April 24–25, 2020, Revised Selected Papers

Editors: Mayank Singh, Dr. P. K. Gupta, Dr. Vipin Tyagi, Prof. Jan Flusser, Prof. Tuncer Ören, Gianluca Valentino

Publisher: Springer Singapore

Book Series : Communications in Computer and Information Science

insite
SEARCH

About this book

This book constitutes the post-conference proceedings of the 4th International Conference on Advances in Computing and Data Sciences, ICACDS 2020, held in Valletta, Malta, in April 2020.*

The 46 full papers were carefully reviewed and selected from 354 submissions. The papers are centered around topics like advanced computing, data sciences, distributed systems organizing principles, development frameworks and environments, software verification and validation, computational complexity and cryptography, machine learning theory, database theory, probabilistic representations.
* The conference was held virtually due to the COVID-19 pandemic.

Table of Contents

Frontmatter

Advanced Computing

Frontmatter
A Computer Vision Based Approach for the Analysis of Acuteness of Garbage

As the population is increasing rapidly day by day the pollution level is also increasing significantly. Several campaigns like Swachh Bharat Abhiyaan (SBA) are aiming to reduce the pollution level. Our approach is to use computer vision technique to classify the garbage based on its severity. For this we have rated garbage on a scale of 1 to 5 with 5 as cleanest and 1 as the dirtiest. To achieve our aim, we have used Faster-RCNN Inception v2 model, and have procured an accuracy of 89.14% using SVM and 89.68% using CNN in detecting different classes of garbage.

Chitransh Bose, Siddheshwar Pathak, Ritik Agarwal, Vikas Tripathi, Ketan Joshi
The Moderating Effect of Demographic Factors Acceptance Virtual Reality Learning in Developing Countries in the Middle East

Innovations of technology keep expanding particularly within the sector of virtual reality and this have sparked competition, transforming the manner of businesses operation. This has stimulated the acceptance towards virtual reality learning in developing nations in the Middle East. Accordingly, the factors impacting consumer acceptance of virtual reality learning are examined in this study, which will further expand the current knowledge particularly on what motivates individuals to utilize virtual reality. A quantitative strategy supports this study and the Unified Theory of Acceptance and Use of Technology (UTAUT) was utilized in deciding the components influencing the reception of people towards virtual generated reality learning. An online survey was performed in the Middle Eastern developing countries to gather data from sample obtained through the technique of snowball sampling. The 432 valid obtained responses were analyzed using SPSS. Scale reliability, normality, correlation and multiple linear regressions were tested for conceptual model establishment. The model was tested for fit by comparing the observed results from the survey tool. The results show that the intent of a person to accept virtual reality learning was significantly impacted by (according to their succession of influencing strength), Execution Expectancy, Effort Expectancy, Social Influence, Facilitating Conditions, Personal Innovativeness (PInn). This examination clarifies how segment factors and factors sway the reception of virtual reality learning administrations in developing countries. This consequently will greatly contribute to increased acceptance level of virtual reality learning in these regions. Furthermore, behavioral intention was significantly impacted by Personal Innovativeness (PInn) on actual acceptance Use behavior. Hence, educational bodies in the Middle East should consider investing massively in virtual reality learning and in other innovations of information technology to increase their support towards efficient service delivery while also increasing the services of virtual reality learning.

Malik Mustafa, Sharf Alzubi, Marwan Alshare
Table Tennis Forehand and Backhand Stroke Recognition Based on Neural Network

In the last few years, microcontroller producers started to produce SoC boards that are not only used to collect data from implemented sensors but also can be used for small neural networks implementation. The goal of this paper is to analyses the possibility of simple neural network implementation for sports monitoring but we will try to use the state of art technologies on that field. Sport monitoring devices can be used in most sports, but in this paper, the device that can recognize forehand and backhand strokes in table tennis will be developed. This task is not so complicated for development but the focus will be on the flexibility and possibility of using this system for other sports. According to the final test results in laboratory conditions, the system that has been developed is 96% accurate in table tennis forehand and backhand stroke recognition. Finally, in our implementation trained neural network was transferred to microcontroller and this approach opens some new possibilities that can be developed in future versions.

Kristian Dokic, Tomislav Mesic, Marko Martinovic
An Effective Vision Based Framework for the Identification of Tuberculosis in Chest X-Ray Images

Tuberculosis is an infection that influences numerous individuals worldwide. While treatment is conceivable, it requires an exact conclusion first. Especially in developing countries there are by and large accessible X-beam machines, yet frequently the radiological aptitude is missing for precisely surveying the pictures. An automated vision based framework that could play out this undertaking rapidly and inexpensively could radically improve the capacity to analyze and at last treat the sickness. In this paper we propose image analysis based framework using various machine learning techniques like SVM, kNN, Random Forest and Neural Network for effective identification of tuberculosis. The proposed framework using neural network was able to classify better than other classifiers to detect Tuberculosis and achieves accuracy of 80.45%.

Tejasvi Ghanshala, Vikas Tripathi, Bhaskar Pant
User Assisted Clustering Based Key Frame Extraction

Our study proposes a novel method of key frame extraction, useful for video data. Video summarization indicates condensing the amount of data that must be examined to retrieve any noteworthy information from the video. Video summarization [1] proves to be a challenging problem as the content of video varies significantly from each other. Further significant human labor is required to manually summarize video. To tackle this issue, this paper proposes an algorithm that summarizes video without prior knowledge. Video summarization is not only useful in saving time but might represent some features which may not be caught by a human at first sight. A significant difficulty is the lack of a pre-defined dataset as well as a metric to evaluate the performance of a given algorithm. We propose a modified version of the harvesting representative frames of a video sequence for abstraction. The concept is to quantitatively measure the difference between successive frames by computing the respective statistics including mean, variation and multiple standard deviations. Then only those frames are considered that are above a predefined threshold of standard deviation. The proposed methodology is further enhanced by making it user interactive, so a user will enter the keyword about the type of frames he desires. Based on input keyword, frames are extracted from the Google Search API and compared with video frames to get desired frames.

Nisha P. Shetty, Tushar Garg
A Threat Towards the Neonatal Mortality

This paper focusses on review and an analysis of the observational studies or case control studies for identification of the threats to the neonatal stage as it is the critical phase for the adaption of the extrauterine life so that significant risk factors are deduced and aiming at the reduction in the mortality of neonatal. Vulnerabilities with respect to both maternal as well as neonatal threatened the survival of neonates. These threats impact economically, socially, psychologically and physical. Predictive analytics to be done through the techniques of machine learning. Supervised learning techniques are adopted for analysis of threats to the neonatal stage. This process splits in two sets: from a give data set, unknown dependencies are to be estimated for training the model and outputs of the system is to be predicted or tested by using estimated dependencies. In future, model can be designed to predict the threats and diseases.

Kumari Deepika, Santosh Chowhan
Digital Marketing Effectiveness Using Incrementality

Digital marketing is one of the fastest-growing advertising channels and crossed the $330 billion mark in 2019. With exponentially increasing budgets, measuring the impact of marketing investments and driving effectiveness becomes essential for brands. The complexity of the digital ad-tech ecosystem is constantly evolving with brands running marketing activities across multiple channels, new targeting capabilities, and different formats. Due to this intricacy, traditional digital measurement metrics like cost per click, return on investment, cost per conversion, etc. just scratch the surface while measuring the actual impact of marketing strategies remains unsettled. We bridged this gap in marketing measurement by using the incremental lift as a metric to measure the impact of a marketing strategy. Incrementality testing is a mathematical approach to differentiate between correlation and causation. We formulated the Viewability Lift method by applying the concepts of A/B testing which can be implemented in the digital marketing ecosystem. In this method, we measure the effectiveness of an ad by comparing the users who are exposed to an ad versus users that are not exposed to an ad. Our methodology covers concepts of test environment setup, randomization, bias handling, hypothesis testing, primary output and understanding different ways of using this output. We used this output for digital marketing strategy planning and campaign optimizations leading to improved campaign efficiency.

Shubham Gupta, Sneha Chokshi
Explainable Artificial Intelligence for Falls Prediction

With a rapidly ageing population, it is likely that we will encounter an older adult falling. Falls can cause death, serious injury or harm, loss of confidence and loss of independence. Falling can happen to any of us, however those over 65 years of age can be classified as a group of adults who are more vulnerable and at increased risk of falling. This paper focuses on applying explainable artificial intelligence techniques, in the form of decision trees, to healthcare data in order to predict the risk of falling in older adults. These decision trees could potentially be introduced for health and social care professionals to help aid their judgements when making decisions.

Leeanne Lindsay, Sonya Coleman, Dermot Kerr, Brian Taylor, Anne Moorhead
Enhanced UML Use Case Meta-model Semantics from Cognitive and Utility Perspectives

Unified Modeling Language (UML) formalized by Object Management Group (OMG) to express analysis and design models is a general-purpose graphical language for visualization and documentation of software system artifacts. UML diagrams are interdependent and hence a change in one diagram at a level would introduce changes in the entire related diagrams. Since UML divides the system model into functional requirement capture views modeled by use case diagrams, static structural views modeled by class diagrams, and dynamic behavior views modeled by interaction and state-machine diagrams. As domain consists of concepts, the higher-order views can be formed from the recognized concepts so that the structuring is visible at the initial development efforts. The models are required to be platform-independent so that they can be mapped to any available platform using migrations. From the model semantics, a metalanguage representing the model language can be created so that the model transformations can be applied vertically and horizontally. In this regard, an attempt to narrate enhanced semantics for use cases and its relationship has been made.

Mahesh R. Dube
The Impact of Mobile Augmented Reality Design Implementation on User Engagement

With the rapid advances in mobile technologies, one of the main goals of Colleges and universities is to reach national and international recognition. Recently many Universities used new strategies to stay competitive, trying to adopt new technologies like Augmented Reality (AR) To capture students’ attention. Augmented Reality (AR) had used as a useful tool in education, but besides teaching and learning, technology may play an important role in university and student engagement. The research aims to study the effect of using augmented reality (AR) on human interaction and engagement. The study focuses on designing an AR prototype framework and a pilot AR application to test its impact on the user’s interaction engagement with the university information platform. The researcher assumes that by applying AR technology, it will improve two way of communication between universities and students, employees and visitors. Research findings show that AR UI and UX collaborate to achieve students engagement. Universities may consider thinking of the efficacy of AR implementations within their business strategy to enrich its recognition in recent intense competition.

Mervat Medhat Youssef, Sheren Ali Mousa, Mohamed Osman Baloola, Basma Mortada Fouda
Intelligent Mobile Edge Computing: A Deep Learning Based Approach

In recent times researchers across the globe have shown keen interest towards advancements in the domain of edge computing. Mobile Edge Computing (MEC) is a new age computing paradigm wherein cloud services are made accessible at network edges via the use of mobile base stations. It is a promising technology that helps in overcoming the limitations of mobile cloud computing. MEC facilitates seamless integration of various application services, thereby proving cloud resources at the edge of the network, within the vicinity of the end-user’s locality. It can effortlessly be integrated with the upcoming 5G architecture, hence supporting the execution of resource-rich applications that require low network latency. In order to enhance the levels of intelligence at mobile base stations, deep learning algorithms can be implemented over network edges for rendering optimized communication and workload balancing. The paper discusses a conceptual architecture for creating a mobile edge computing environment involving the applicability of deep learning algorithms. The paper discusses the fundamentals of MEC along with specific applications of reinforcement and continuous learning in an edge environment. We list the benefits of MEC along with a discussion on how its amalgamation with deep learning models can prove beneficial in case of a computation offloading scenario.

Abhirup Khanna, Anushree Sah, Tanupriya Choudhury
Analysis of Clustering Algorithms in Machine Learning for Healthcare Data

Clustering algorithm is one of the most popular data analysis technique in machine learning to precisely evaluate the vast number of healthcare data from the body sensor networks, internet of things devices, hospitals, clinical, medical data repositories, and electronic health records etc. The clustering algorithms always play a crucial role to predict the diseases by partitioning the similar patient’s data based on their relevant attributes. The vast number of clustering algorithms have been developed for analyzing several healthcare data sets so far. However, the algorithms presented in the literature may achieve a better result with a particular type of data set but may fail or provide poor results with the data set of other types. Many of the research studies considered specific or multiple data sets for clustering analysis. But there are only a few studies used mixed type of data for analyzing and verifying the optimal number of clusters. To alleviate these issues, this paper aims to inspect various clustering algorithms from the theoretical and experimental perspectives. The experimental results elucidate the best algorithm from each categories using a physiological data set. The efficiency of each clustering algorithm in machine learning is validated using a number of internal as well as stability measures. Finally, this paper highlights the future directions with a proper clustering algorithm for handling high dimensional healthcare data sets.

M. Ambigavathi, D. Sridharan
Securing Mobile Agents Migration Using Tree Parity Machine with New Tiny Encryption Algorithm

A Mobile agents are combination of software programs which works automatically in homogeneous and non-homogeneous environment from one host to another for sharing information among users. Mobile agents migrate in unsecure network, so mobile agent’s security is a major concern during the communication and sharing of data & information. Mobile agent’s migration has major security issues i.e. data integrity, data confidentiality & authentication, on-repudiation, denial of service and access control. In this paper neural network based synchronization key exchange is proposed for Encryption and Decryption.

Pradeep Kumar, Niraj Singhal, K. M. Chaitra
An Approach to Waste Segregation and Management Using Convolutional Neural Networks

Population explosion in India has led to an outburst of some major concerns, one of which is the waste generation and disposal system. India is accountable for producing 12% of the global municipal waste. As a result, waste is collectively dumped irrespective of the type leading to drainage blockages, pollution and diseases. In this paper, we have proposed a model for waste segregation that uses neural networks to classify waste images into three categories namely recyclable, non-recyclable and organic. A training accuracy of 83.77% and testing accuracy of 81.25% was obtained. Along with the proposed neural network, five standard CNN architectures – VGG-16, Dense-Net, Inception-Net, Mobile-Net and Res-Net are also tested on the given dataset. The highest test accuracy of 92.65% was obtained from the Mobile-Net classifier. The paper also proposes an on-site waste management system using 8051 micro-controller and GSM technology.

Deveshi Thanawala, Aditya Sarin, Priyanka Verma
Open Source Intelligence Initiating Efficient Investigation and Reliable Web Searching

Open Source Intelligence (OSINT) is the collection and processing of information collected from publicly available or open-source web portals or sites. OSINT has been around for hundreds of years, under one name or another. With the emergence of instantaneous communication and rapid knowledge transfer, a great deal of actionable and analytical data can now be collected from unclassified, public sources. Using OSINT as the base concept, we have attempted to provide solutions for two different use cases i.e. the first is an investigation platform that would help in avoiding manual information gathering saving time and resources of information gatherers providing only the relevant data in an understandable template format rather than in graphical structure and focuses on demanding minimal input data. The second is a business intelligence solution that allows users to find details about an individual or themselves for business growth, brand establishment, and client tracking further elaborated in the paper.

Shiva Tiwari, Ravi Verma, Janvi Jaiswal, Bipin Kumar Rai
A Neural Network Based Hybrid Model for Depression Detection in Twitter

Depression is a serious mental illness that leads to social disengagement, affects an individual’s professional and personal life. Several studies and research programs are conducted for understanding the main causes of depression and an indication of psychological problems through speech and text-based data generated by human beings. The language is considered to be directly related to the current mental state of an individual, that’s why social media network is utilized by researchers in detecting depression and helps in the implementation of the intervention program. We proposed a hybrid model using CNN & LSTM models for detecting depressed individuals through normal conversation-based text data, that retrieved from twitter. We employ machine-learning classifiers and proposed method on twitter dataset to compare their performance for depression detection. The proposed model provides an accuracy of 92% in comparison with the machine learning technique that gives a maximum accuracy of 83%.

Bhanu Verma, Sonam Gupta, Lipika Goel
Unleashing the VEP Triplet Count of Virtually Created 3D Bangla Alphabet to Integrate with Augmented Reality Application

This paper demonstrates the process of calculating VEP triplet count of Bangla Alphabet which are unlike traditional hand written or printed letters, but created entirely in 3 dimensional virtual environment to effectively use in Augmented Reality Based applications. This count can be a potential support for 3d modelers, Augmented Reality based application developers and academic researchers. The challenges that we faced and the limitations of our contribution are also mentioned here. It is expected that future academic researchers and commercial application developers will get a great support on developing 3D Bangla alphabet from our research.

Apurba Ghosh, Anindya Ghosh, Arif Ahmed, Md Salah Uddin, Mizanur Rahman, Md Samaun Hasan, Jia Uddin
A Hybrid Machine Learning Framework for Prediction of Software Effort at the Initial Phase of Software Development

In the era of software application, the prediction of the effort of software plays an essential role in the success of project software. The inconsistent, inaccurate, and unreliable prediction of software leads to failure. As the requirement and specification changes as per the software needs, accurate prediction of effort is a difficult task for developing software. This prediction of effort must be calculated accurately to avoid unpredicted results. At the early stages of development, these inaccurate, unreliability, and uncertainty are the drawback of previously developed models. The main aim of the study is to overcome the drawbacks and develop a model for the prediction of the effort of software. A combination of regression analysis and genetic algorithm has been used to develop the model. The model is trained and validated using the ISBSG dataset. The proposed model is compared for performance with a few baseline models. The results show that the proposed model outperforms most of the baseline models against different performance metrics.

Prerana Rai, Shishir Kumar, Dinesh Kumar Verma
Chronic Disease Prediction Using Deep Learning

Nowadays data is growing rapidly in bioscience and health protection, in clinical information, an exact investigation can benefit early infection identification, patients’ social insurance, and community services. Prediction is an significant aspect in the health care domain. In this paper, we establish ML and deep learning algorithms for Prediction of patients’ chronic diseases. Experiment with the refitted prediction model from the standard dataset available. Objective of this paper is to forecast chronic diseases in the individual patient by using the machine learning method, K-nearest neighbor, decision tree and deep learning using (RELU or Rectified linear activation function, sigmoid activation function, deep sequential network) and Adam as an optimizer. Examine to several ordinary algorithms, the accuracy of the proposed system is enhanced. With the comparison of other algorithms, deep learning algorithms will give better accuracy it’s about 98.3%. These techniques are applied to predict heart, breast cancer, and diabetes chronic diseases.

Jyoti Mishra, Sandhya Tarar
A Deep Learning Based Method to Discriminate Between Photorealistic Computer Generated Images and Photographic Images

The rapid development of multimedia tools has changed the digital world drastically. Consequently, several new technologies like virtual reality, 3D gaming, and VFX (Visual Effects) have emerged from the concept of computer graphics. These technologies have created a revolution in the entertainment world. However, photorealistic computer generated images can also play damaging roles in several ways. This paper proposes a deep learning based technique to differentiate computer generated images from photographic images. The idea of transfer learning is applied in which the weights of pre-trained deep convolutional neural network DenseNet-201 are transferred to train the SVM to classify the computer generated images and photographic images. The experimental results performed on the DSTok dataset show that the proposed technique outperforms other existing techniques.

Kunj Bihari Meena, Vipin Tyagi
Load Balancing Algorithm in Cloud Computing Using Mutation Based PSO Algorithm

Cloud computing is a prominent technology that uses dynamic allocation technique to assigns tasks to virtual machines (VM). As per the usage, users are charged by the cloud service provider. There are various challenges that a Cloud service provider (CSP) faces, out of which load balancing being one of the significant challenge. Many algorithms have been proposed till now for load balancing algorithm, where each one focuses on the different parameters. However these proposed approached in the literature experiences various issues such as poor speed for convergence, untimely convergence, the first random chosen solution. None of the algorithms has proven to be completely sufficient. To solve the problems associated with existing meta-heuristic techniques, paper discuss a load balancing approach that is based on mutative Particle Swarm optimization. The load on the data centers are balanced by the help of proposed algorithm and parameters such as makespan are minimized while improving the overall fitness function of algorithm.

Saurabh Singhal, Ashish Sharma
Statistical Model for Qualitative Grading of Milled Rice

Rice is a principal dietary component for most of the world’s population and comprises many breeds and qualities. Qualitative grading of rice is essential to determine its cost. This grading process is currently conducted via physical and chemical invasive processes. Automated, non-invasive grading processes are pivotal in reducing the drawbacks of invasive processes. This research involves the construction of a method for automated, non-invasive, qualitative grading of milled rice. Percentage of broken rice is one of the factors which governs the grading of rice. The method developed is useful in predicting the percentage of broken rice from the image of a given sample of rice. Color images of rice were acquired using cellphone camera. The images were processed by a foreground detector program. Statistical analysis was implemented to extract features for the formation of a regression model. The entire method is executed in MATLAB. The method involves simple regression models and hence requires lesser runtime (4.0567 s) than existing methods of calculating percentage of broken rice. The process produces low root mean square error (0.69 and 0.977) and high r squared (0.999 and 0.999) values for overlapping and non-overlapping grains’ dataset respectively.

Medha Wyawahare, Pooja Kulkarni, Abha Dixit, Pradyumna Marathe
Measuring the Effectiveness of Software Code Review Comments

Code reviewing becomes a more popular technique to find out early defects in source code. Nowadays practitioners are going for peer reviewing their codes by their co-developers to make the source code clean. Working on a distributed or dispersed team, code review is mandatory to check the patches to merge. Code reviewing can also be a form of validating functional and non-functional requirements. Sometimes reviewers do not put structured comments, which becomes a bottle neck to developers for solving the findings or suggestions commented by the reviewers. For making the code review participation more effective, structured and efficient review comments is mandatory. Mining the repositories of five commercialized projects, we have extracted 15223 review comments and labelled them. We have used 8 different machine learning and deep learning classifiers to train our model. Among those Stochastic Gradient Descent (SGD) technique achieves higher accuracy of 80.32%. This study will help the practitioners to build up structured and effective code review culture among global software developers.

Syeda Sumbul Hossain, Yeasir Arafat, Md. Ekram Hossain, Md. Shohel Arman, Anik Islam
Proposed Model for Feature Extraction for Vehicle Detection

A feature is a prominent interest point in an image that can be used for a different task processing of image besides computer vision based on processes for object recognition. The features could be extracted by mathematical models that detect deep variations in texture, detect edges, or color. The selected features must have global definition within the defined problem vehicle detection. The focus of this paper is on detection of vehicle, Extraction of Region of Interest for the feature which is represented globally their module might produce a model intended for encoding of images’ features dependency technique can be applied. In this paper, We offer an extremely robust, capable, method aimed at creation of image feature vector for vehicle detection model system with both feature extraction also global feature representation method for both inter classes sameness and also the intra-class variation, thus to overcome the problem of multiplicity and ambiguity issues.

Padma Mishra, Anup Girdhar
Analysis of Feature Selection Methods for P2P Botnet Detection

Botnets are one of the major threats today and one of the main reasons for this is its capability to hide in the network. It is not easy to detect Botmaster, the one who controlled botnets from a far end. There are different technologies and algorithms that are used for the detection of a botnet in a network. Some of the prominent techniques are based on machine learning algorithms. Machine learning have been proven in the past that they are the best in the business and also the leading techniques to detect botnet. In order to implement machine learning algorithms, the most important task is to analyze the dataset very well before using it. Feature selection techniques help in doing this. With the help of different feature selection techniques, we can find out the best criteria for the detection of a botnet. In a particular dataset, there are different numbers and types of features are present, and all these features don’t contribute equally to the detection of the botnet. We need to find out the important features which will be more useful in building a botnet detection model. Many algorithms have been used in the past for botnet detection but most of them have used the different feature selection methods for different datasets. In this paper, we will be covering different feature selection methods and their analysis on the different botnets. Also, in the end, we will be comparing all these techniques and giving the best for a particular botnet.

Chirag Joshi, Vishal Bharti, Ranjeet Kumar Ranjan
ELM-MVD: An Extreme Learning Machine Trained Model for Malware Variants Detection

Malware variants are expanding at a fast pace and detecting them is a critical problem. According to surveys from McAfee, over 50% of the newly recognized malware are variants of earlier ones. Huge amount of miscellaneous malware variants compelled researchers to find a better model for detecting them. In this work, we propose an extreme learning machine trained model (ELM- MVD) for malware variants detection. We use the dataset comprising benign and malware executable names along with their features represented as a triplet of system calls. Along with that, we demonstrate that features in the form of a triplet vector are optimal while training a model. Feature reduction is done using an alternating direction method of multipliers (ADMM) technique. Finally, training is done on the ELM-MVD model and achieve 99.3% accuracy and 0.003 s detection speed.

Pushkar Kishore, Swadhin Kumar Barisal, Alle Giridhar Reddy, Durga Prasad Mohapatra
Real-Time Biometric System for Security and Surveillance Using Face Recognition

This paper-based on real-time security and surveillance because today public security problems raised widespread concern. So due to the security issue, it can use the different types of biometrics for security and surveillance and hence facial recognition has important applications in the field of biometric and numerous systems related to security and surveillance. Many methods familiarized in this field, but some problems remain not recognized because that system is only used as a biometric in small scales, but in this paper, we focus on the live security of all public and private places and multiples uses. For this real-time security used the IP CCTV cameras for surveillance purposes for this need the image as a sample of a person for model training and basic details of the person who lives inside the city. Whenever need surveillance for any suspicious person or place inside the city. Then we can find out his/her present location, and all incident is monitor by IP CCTV Cameras. This all incident update in the centralized system so that we can find the live location of the particular person anytime and anywhere within the city. This security system has many applications in real life like crime control, terrorist alert, smart society, airport security, university campus and surveillance of public places. In the future, all the data collected from “The Unique Identification Authority of India” (UIDAI).

Arvind Jaiswal, Sandhya Tarar
An Effective Block-Chain Based Authentication Technique for Cloud Based IoT

Nowadays, cloud computing services and applications have created plenty of storage space to meet the demand and the utility for human life. However, these cloud services face serious security challenges against various attacks. Cloud computing services rely primarily on confidential data generated by devices connected to cloud account-specific personal information. These devices typically use security in an ID-based encryption scheme. The Identity-Encryption Technique (IDET) technique is an essential public cryptosystem that uses a user’s identity information email or IP address. IDET uses signature authentication used in public-key encryption or public key infrastructure (PKI) instead of digital keys. The PKI uses a password for public key infrastructure (PKI) authentication. In contrast, the proposed model, an ID-based Block-chain Authority (IDBA) no need to maintain hash for authentication. Getting rid of Key management issues and avoiding a secure channel at the key exchange point is a functional problem.

S. Dilli Babu, Rajendra Pamula
Early Detection of Autism Spectrum Disorder in Children Using Supervised Machine Learning

Autism Spectrum Disorder (ASD) is a disorder which takes place in the developmental stages of an individual and affects the language learning, speech, cognitive, and social skills, and impacts around 1% of the population globally [14]. Even though some individuals are diagnosed with ASD, they can portray outstanding scholastic, non-academic, and artistic capabilities, which thus proves to be challenging to the scientists trying to provide answers to this. At present, standardized tests are the only methods which are used clinically, in order to diagnose ASD. This not only requires prolonged diagnostic time but also faces a steep increase in medical costs. In recent years, scientists have tried to investigate ASD by using advanced technologies like machine learning to improve the precision and time required for diagnosis, as well as the quality of the whole process. Models such as Support Vector Machines (SVM), Random Forest Classifier (RFC), Naïve Bayes (NB), Logistic Regression (LR) and KNN have been applied to our dataset and predictive models have been constructed based on the outcome. Our objective is to thus determine if the child is susceptible to neurological disorders such as ASD in its nascent stages, which would help streamline the diagnosis process.

Kaushik Vakadkar, Diya Purkayastha, Deepa Krishnan
Anatomical Analysis Between Two Languages Alphabets: Visually Typographic Test Transformation in Morphological Approaches

We have many different types of languages and alphabets in the world, but the alphabet of each language has its own unique design. We tried to bridge between the two countries/languages by reflecting the differences in the alphabet in another language by highlighting its distinctive features. By using the two most commonly used languages like English & Arabic. Here the transformation process of flavors one into another language has been introduced by replication. The alphabet has the same characteristics in the alphabet of the two languages and reflected in other languages. In this study, we have completed the task of transforming the character flavor of English and Arabic as well as the Arabic language in English. The process is finding characteristic patterns and connecting to any other language alphabet to achieving new tastes.

Mizanur Rahman, Md. Salah Uddin, Md. Samaun Hasan, Apurba Ghosh, Sadia Afrin Boby, Arif Ahmed, Shah Muhammad Sadiur Rahman, Shaikh Muhammad Allayear
Auto Segmentation of Lung in Non-small Cell Lung Cancer Using Deep Convolution Neural Network

Segmentation of Lung is the vital first step in radiologic diagnosis of lung cancer. In this work, we present a deep learning based automated technique that overcomes various shortcomings of traditional lung segmentation and explores the role of adding “explainability” to deep learning models so that the trust can be built on these models. Our approach shows better generalization across different scanner settings, vendors and the slice thickness. In addition, there is no initialization of the seed point making it complete automated without manual intervention. The dice score of 0.98 is achieved for lung segmentation on an independent data set of non-small cell lung cancer.

Ravindra Patil, Leonard Wee, Andre Dekker
Multiwavelet Based Unmanned Aerial Vehicle Thermal Image Fusion for Surveillance and Target Location

A novel image fusion method in multiwavelet domain is proposed in this paper. The special frequency band and property of image in multiwavelet domain are employed for the image fusion algorithm. Due to the widespread use of digital media applications, multimedia refuge and the fusing has grown incredible important. Here in this research work, a low resolution multispectral and high resolution RGB image fused here is he new method to fuse that is proposed, to find out the armed person behind deep forest with surrounding trees. The picture is acquired from a wing which is new unmanned aerial vehicle (UAV) at 90 to 100 m distance in dark light surroundings. The combined effect of the texture resolution by a heavy decree RGB image and the thermal image taken by Dual Sensor Night Vision Goggle (DSNVG), to retrieve a fused IR RGB-thermal good image of the armed person. Inside this research work, The DSNVG is in construction to offer fusion of thermal imagery, to afford the profit of larger positional alertness due to developed risk discovery underneath nearly all battlefield outsides, like-minded with established bludgeon structure ranges, prolonged performance potential from high-light circumstances to sum dusk and through battlefield obstacles, increasing ability for municipal work. Here, Multiwavelet transform is being compared with wavelet packet for aerial vehicle fusion. In this work concludes that multiwavelet performs better than wavelet packet.

B. Bharathidasan, G. Thirugnanam
Investigating Movement Detection in Unedited Camera Footage

Digital evidence from CCTVs is an aid in crime scene investigations and there is a demand for more automation. This paper describes a system that detects motion induced events within a video clip based on user-defined criteria, such as filtering by colour and size of the moving object and then extracts features and regions where events have been detected. Post processing includes finding association rules between objects that appear simultaneously in a clip based on their colour. All processing techniques follow best practices.The available Wallflower dataset is used for evaluation, and confusion matrices are computed by comparing the results achieved by this system against the ground truth values for each image sequence. Ranges of effective pre-processing parameter values were set for erosion, dilation and background subtractor threshold and the system was tested across a wide array of parameter values. For each combination, measures are extracted and the F1 Score is calculated.The lowest and highest F1 Score obtained across all image sequences were of 67% and 95% respectively. It is noted that the image quality of clips and background affect the F1 scores.

Samuel Sciberras, Joseph G. Vella
Time Series Forecasting Using Machine Learning

Forecasting is an essential part of any business as extensive amount of data is available, one needs to combine statistical model with machine learning to improve accuracy, throughput and overall performance. In this paper a time series forecasting approach is used with machine learning techniques to forecast the store item demands. SARIMA(0,1,1)X(0,1,0)12 model is used with parameters (0,1,0,12) referring to seasonalcomponents of series combined with ARIMA (0,1,1) for trend components. We trained our model taking past 4 year values of store items and predicted sales for next year.

Ruchi Verma, Joshita Sharma, Shagun Jindal
Improving Packet Queues Using Selective Epidemic Routing Protocol in Opportunistic Networks (SERPO)

Opportunistic networks are an extension of Adhoc networks and subclass of MANET wherein network possess intermittent connectivity and store – carry – forward mechanism. During routing, if a packet transmission has been initiated by a node through Epidemic to all its neighboring nodes by producing a multiple copy resulting in the network resource consumption early. In Epidemic routing mechanism a multiple copy of a message packet has been created in the direction of inflate the message packet delivery ratio contrary it also enhances additional buffer capacity of a network. By proposing a SERPO approach the packet transmission happens only on a selected route which find out with the help of Dijkstra’s algorithm and for minimizing the packet congestion using Weighted Fair Queuing. In this paper we propose an outline to improve the delivery ratio, transmission delay and access delay of a packet in a network using SERPO. We broadly simulate the proposed scheme in ONE simulator and compared it with Epidemic variants using the network performance metrics message transmission (packet delivery), message relaying time (transferring time) and access delay ratio respectively.

Tanvi Gautam, Amita Dev
Heart Disease Prediction System Using Classification Algorithms

Heart disease and stroke have had an impact on 28:1% of total deaths in India in 2016 as compared to 15:2% in 1990. With the rising use of learning algorithms, In this edition paper, we have developed a system for predicting heart disease that can predict heart disease by using a modified random forest algorithm. The proposed algorithm is trained with a dataset consisting of 303 instances which help to predict the occurrence of heart disease with an accuracy of 86:84% and can be implemented in the medical field to improve the overall diagnosis about heart disease.

Sarthak Vinayaka, P. K. Gupta

Data Sciences

Frontmatter
Graph Database and Relational Database Performance Comparison on a Transportation Network

Facing the problem of structuring irregular data in the big data era, graph databases are a powerful solution to handle link relationship without costly operations and enjoy great flexibility as data model changes. Though it’s well-known that graph databases have superior performance in a certain area than relational databases, little effort has been put into investigating the detail of these advantages. In this paper, we report a systematic performance study of graph databases and relational databases on a transportation network. We design a database benchmark considering traversal and searching performance to evaluate system performance in different data organizations, initial states, and running modes. Our results show that graph databases outperform relational databases system in three main graph algorithms testing. Furthermore, we discuss the reasonable practice in applications based on graph databases from our experiment results.

Jinhua Chen, Qingyu Song, Can Zhao, Zhiheng Li
Optimizing Creative Allocations in Digital Marketing

Establishing the best strategy to optimize and test digital advertising campaigns is essential to the success of every marketing campaign. One common “test-and-learn” approach is creative optimization through which advertisers can generate the highest possible ROI on their advertising spends. Due to the uncertainty in determining the most effective creative a priori to a campaign, companies experiment with various strategies. Marketing firms try to distribute their creatives to both explore (sample more information) and exploit (the current data). The aim is to dynamically explore which creative is best suited to a specific audience by running multiple parallel experiments and exploit it in the post-experimentation phase. This explore/exploit trade-off is best explained by the Multi-Armed Bandits (MAB), the fundamental pillar in this discourse. MAB relies on Reinforcement Learning to converge on a solution with the least opportunity costs. Over time, we have tested key model parameters which can help in delivering campaign goals efficiently with improved uplift. We propose a customized MAB solution that has the potential to offer at least 50% uplift in a marketing KPI relative to traditional MAB policies through dynamic creative optimization.

Shubham Gupta, Anshuman Gupta, Parth Savjani, Rahul Kumar
Big Data Analytics for Customer Relationship Management: A Systematic Review and Research Agenda

In today’s dynamic business scenario, customers have the power to rule the market on their terms and conditions. Customer Relationship Management (CRM) plays an imperative role by covering all methods and measures to have a better customer understanding, and to make the most of this knowledge in applications like production and marketing. With the emergence of big data, it brings a whole new inclusion of CRM strategies which can support customization of sales, personalization of services, and customer interactions. The paper aims to study the extant state of big data analytics for customer relationship management through the method of systematic literature review. Thematic analysis from the relevant studies is done and a framework is proposed as an outcome of the study. This framework can be used to analyze the present state of research in area of big data analytics and CRM and also future directions for the further research are provided in the paper.

Sarika Sharma
Agricultural Field Analysis Using Satellite Surface Reflectance Data and Machine Learning Technique

The environmental, social, and economic problems confronting agriculture today are symptoms of agricultural industrialization. In this study, the agricultural field is analyzed using satellite surface reflectance data. This technology facilitates monitoring of crop vegetation by spectral analysis of satellite images of different sites and crops which can track positive and negative dynamics of crop development. Using this analysis, the field can be categorized into different categories rating its potency to grow crops, which helps the user to get detailed information about the current condition of the field. For the analysis, we have used Landsat 8 data. We have used the Google Earth Engine to import the data from the ground station. The indices we have used for this study are Normalized Difference Vegetation Index (NDVI), Modified Soil Adjusted Vegetative Index (MSAVI) and Normalized Difference Water Index (NDWI) and average rainfall data. For clustering the data, we have implemented k-means clustering algorithm. We have collected data from over 6 years and by taking mean values we classified the agricultural fields into different categories according to their quality.

Medha Wyawahare, Pranesh Kulkarni, Aditya Kulkarni, Ankit Lad, Jayant Majji, Aayush Mehta
Sponsored Data Connectivity at the Network Edge

Sponsored data connectivity enables third parties to pay for specific content traffic used by mobile subscribers. It requires network’s intelligence for specific traffic detection, usage monitoring, reporting and credit management. The paper presents an approach to define Application Programming Interfaces (API) for sponsored data connectivity deployed at the edge of the mobile network. API enable setting and updating a chargeable party at session setup or during the session The API design follows the Representational State Transfer architectural style. A feasibility study is provided which illustrates the API practicability. The latency introduced by the proposed API is evaluated by emulation.

Ivaylo Atanasov, Evelina Pencheva, Ivaylo Asenov, Ventsislav Trifonov
Dynamic Bidding with Contextual Bid Decision Trees in Digital Advertisement

Real-time bidding (RTB) has been one of the most prominent technological advances in online display advertising. Billions of transactions in the form of programmatic advertising auctions happen on a daily basis on ad-networks and exchanges, where advertisers compete for the ad slots by bidding for that slot. The question: how much should I bid? has lingered around and troubled many marketers from a long time. Past strategies’ formulation has been mostly based on targeting users by analyzing their browsing behavior via cookies to predict the likelihood that they will interact with the ad. But due to growing privacy concerns where browsers are taking down cookies and recent regulations like General Data Protection Regulation (GDPR) in Europe, targeting users has become difficult and these bidding methodologies fail to deliver. This paper presents a novel approach to tackle the dual problem of optimal bidding and finding an alternative to user-based targeting by focusing on contextual-level targeting using features like site-domain, keywords, postcode, browser, operating system, etc. The targeting is done at feature combination level in the form Bid Decision Trees. The framework discussed in the paper dynamically learns and optimizes bid values for the context features based on their performance over a specific time interval using a heuristic Feedback Mechanism to optimize the online advertising KPIs: Cost per Acquisition (CPA) and Conversion Rate (CVR). A comparison of the performance of this context-based tree-bidding framework reveals a 59% lower CPA and 163% higher CVR as compared to other targeting strategies within the overall campaign budget, which are clear indicators of its lucrativeness in a world where user-based targeting is losing popularity.

Manish Pathak, Ujwala Musku
MOOC Performance Prediction by Deep Learning from Raw Clickstream Data

Student performance prediction is a challenging problem in online education. One of the key issues relating to the quality Massive Open Online Courses (MOOC) teaching is the issue of how to foretell student performance in the future during the initial phases of education. While the fame of MOOCs has been rapidly increasing, there is a growing interest in scalable automated support technologies for student learning. Researchers have implemented numerous different Machine Learning algorithms in order to find suitable solutions to this problem. The main concept was to manually design features through cumulating daily, weekly or monthly user log data and use standard Machine Learners, like SVM, LOGREG or MLP. Deep learning algorithms could give us new opportunities, as we can apply them directly on raw input data, and we could spare the most time-consuming process of feature engineering. Based on our extensive literature survey, recent deep learning publications on MOOC sequences are based on cumulated data, i.e. on fine-engineered features. The main contribution of this paper is using raw log-line-level data as our input without any feature engineering and Recurrent Neural Networks (RNN) to predict student performance at the end of the MOOC course. We used the Stanford Lagunita’s dataset, consisting of log-level data of 130000 students and compared the RNN model based on raw data to standard classifiers using hand-crafted commulated features. The experimental results presented in this paper indicate the RNN’s dominance given its dependably superior performance as compared with the standard method. As far as we know, this will be the first work to use deep learning to predict student performance from raw log-line level students’ clickstream sequences in an online course.

Gábor Kőrösi, Richard Farkas
UDHR - Unified Decentralized Health Repository

The healthcare domain is always upgrading itself in order to provide better care. The use of digital media for medical purposes has been on the rise. Naturally, these methods are being used to store and retrieve medical records. These records are widely known as electronic health records. The proposed system, Unified Decentralized Health Repository (UDHR) is inspired by Electronic Health Records (EHR). The project aims to integrate and digitize medical records. It also aims to provide wellness-oriented treatment to patients. Earlier the goal of the medical community was to provide the traditional illness-oriented treatment. The innovativeness in the system lies within the blockchain. It is modified to cater to the needs for security and data searching efficiency. Using natural language processing, text summarization is achieved for medical records. A summarized document is generated at the end. It is hashed to attain compression.

Premanand P. Ghadekar, Anant Dhok, Anuj Khandelwal, Ayush Tejwani, Sonica Kulkarni, Srivallabh Mangrulkar
Mining Massive Time Series Data: With Dimensionality Reduction Techniques

A pre-processing step to reduce the volume of data but suffer an acceptable loss of data quality before applying data mining algorithms on time series data is needed to decrease the input data size. Input size reduction is an important step in optimizing time series processing, e.g. in data mining computations. During the last two decades various time series dimensionality reduction techniques have been proposed. However no study has been dedicated to gauge these time series dimensionality reduction techniques in terms of their effectiveness of producing a reduced representation of the input time series that when applied to various data mining algorithms produces good quality results. In this paper empirical evidence is given by comparing three reduction techniques on various data sets and applying their output to four different data mining algorithms. The results show that it is sometimes feasible to use these techniques instead of using the original time series data. The comparison is evaluated by running data mining methods over the original and reduced sets of data. It is shown that one dimensionality reduction technique managed to generate results of over 83% average accuracy when compared to its benchmark results.

Justin Borg, Joseph G. Vella
Comparative Analysis of Data Mining Techniques to Predict Heart Disease for Diabetic Patients

The healthcare sectors have many difficulties and challenges in finding diseases. Healthcare organizations are collecting bulk amount of patient data. The Data mining methods are utilized to decide covered data that is valuable to healthcare specialists with effective analytic decision making. Data mining strategies are utilized in the field of the healthcare industry for different purposes. The objective of this paper is to assess and analyze using three unique data mining arrangement methods, for example, Naïve Bayes (NB), Support Vector Machine (SVM) and Decision Tree to decide the potential approaches to predict the possibility of heart disease for diabetic patients dependent on their predictive accuracy.

Abhishek Kumar, Pardeep Kumar, Ashutosh Srivastava, V. D. Ambeth Kumar, K. Vengatesan, Achintya Singhal
Backmatter
Metadata
Title
Advances in Computing and Data Sciences
Editors
Mayank Singh
Dr. P. K. Gupta
Dr. Vipin Tyagi
Prof. Jan Flusser
Prof. Tuncer Ören
Gianluca Valentino
Copyright Year
2020
Publisher
Springer Singapore
Electronic ISBN
978-981-15-6634-9
Print ISBN
978-981-15-6633-2
DOI
https://doi.org/10.1007/978-981-15-6634-9

Premium Partner