Skip to main content
Top

2020 | Book

Data Management, Analytics and Innovation

Proceedings of ICDMAI 2019, Volume 1

Editors: Prof. Neha Sharma, Dr. Amlan Chakrabarti, Prof. Valentina Emilia Balas

Publisher: Springer Singapore

Book Series : Advances in Intelligent Systems and Computing

insite
SEARCH

About this book

This book presents the latest findings in the areas of data management and smart computing, big data management, artificial intelligence and data analytics, along with advances in network technologies. It addresses state-of-the-art topics and discusses challenges and solutions for future development. Gathering original, unpublished contributions by scientists from around the globe, the book is mainly intended for a professional audience of researchers and practitioners in academia and industry.

Table of Contents

Frontmatter

Data Management and Smart Informatics

Frontmatter
Empirical Study of Soft Clustering Technique for Determining Click Through Rate in Online Advertising

Online advertising is an industry with the potential for maximum revenue extraction. Displaying the ad which is more likely to be clicked plays a crucial role in generating maximum revenue. A high click through rate (CTR) is an indication that the user finds the ad useful and relevant. For suitable placement of ads online and rich user experience, determining CTR has become imperative. Accurate estimation of CTR helps in placement of advertisements in relevant locations which would result in more profits and return of investment for the advertisers and publishers. This paper presents the application of a soft clustering method namely fuzzy c-means (FCM) clustering for determining if a particular ad would be clicked by the user or not. This is done by classifying the ads in the dataset into broad clusters depending on whether they were actually clicked or not. This way the kind of advertisements that the user is interested in can be found out and subsequently more advertisements of the same kind can be recommended to him, thereby increasing the CTR of the displayed ads. Experimental results show that FCM outperforms k-means clustering (KMC) in determining CTR.

Akshi Kumar, Anand Nayyar, Shubhangi Upasani, Arushi Arora
ASK Approach: A Pre-migration Approach for Legacy Application Migration to Cloud

Legacy application migration is a mammoth task, if migration approach is not well thought at the very start, i.e. pre-migration, and supported by robust planning especially at pre-migration process area. This paper proposes a mathematical pre-migration approach, which will help the enterprise to analyse existing/legacy application based on the application’s available information and parameters an enterprise would like to consider for analysis. Proposed pre-migration assessment will help in understanding the legacy application’s current state and will help in un-earthing the information with respect candidate application. Proposed pre-migration approach will help to take appropriate well-informed decision, whether to migrate or not to migrate the legacy application. As it is said that application migration is a journey, if kick-started once, needs to reach its destination else it can result into a disaster hence pre-migration is one of the important areas of migration journey.

Sanjeev Kumar Yadav, Akhil Khare, Choudhary Kavita
A Fuzzy Logic Based Cardiovascular Disease Risk Level Prediction System in Correlation to Diabetes and Smoking

The cardiovascular disease (CVD) is one of the major causes of death among the people having diabetes in addition to smoking habits. It will create tribulations for every organ of the human body. Smoking becomes fashion among the youth from their childhood which results in premature death. The intention of this paper is to explain the impact of diabetes and smoking along with high BP, high pulse rate, angina affect, and family history on the CVD risk level. The concept used is based on the knowledge-based system. We have proposed a fuzzy-logic-based prediction system to evaluate the CVD risk among the people having diabetes with smoking habits. The aim is to facilitate the experts to provide the medication as well as counsel the smokers well in advance. This will not merely save the individual but also an immense relief to concern. The data set is used from UCI (Machine Learning Repository). Most of the researchers worked on diabetes or smoking impact on CVD separately, but the proposed system demonstrates how drastically it will affect ones’ health condition.

Kanak Saxena, Umesh Banodha
An Integrated Fault Classification Approach for Microgrid System

In this paper, a moving windowing approach-based integrated fault classification algorithm is proposed for microgrid system. In a microgrid system, the nonlinear operation of control devices connected to distributed generation (DG) imposes problem for identifying the exact faulty class. In order to mitigate this issue, an integrated moving window averaging technique (IMWAT) is proposed. The method utilizes current signal at the line end. In this technique, first, the decision of the fault detection unit (FDU) is analyzed and based on that fault class is detected. The FDU uses the conventional moving window averaging technique. Different logics are framed to identify the symmetrical and unsymmetrical faults. The method is tested on a standard microgrid network and obtained results for different fault cases prove the efficacy of the proposed method.

Ruchita Nale, Ruchi Chandrakar, Monalisa Biswal
Role of Data Analytics in Human Resource Management for Prediction of Attrition Using Job Satisfaction

The reputed management publications like Harvard Business Review (HBR) have started stressing upon the emergence of data-driven management decisions. The enhancing investments in data and analytics are underlining the aforementioned emergence. According to International Data Corporation, this investment is expected to grow up to $200 billion by 2020. In such a data lead management world collecting, managing, and analysing the human resources-related data becomes a key for any rather every organization. Human resource analytics is changing into necessary as strategic personnel designing is the need of the hour and helps organizations to investigate each side of HR metrics. HR analytics could be a holist approach. According to KPMG—India’s Annual Compensation Trends Survey 2018–19 the average annual voluntary attrition across sectors is 13.1%. This is a considerably high percentage. Hence, antecedents leading to attrition are needed to be explored in order to propose appropriate HR policies, strategies, and practices. In relevance to these facts, this study focused on proposing a data-driven predictive approach that examines the relationship between the attrition (dependent variable) and other demographic and psychographic independent variables (Antecedents). The present study found that there is a strong relationship between job satisfaction and attrition. Further, there is a higher probability that the employees having work experience between 0–5 years may leave the organizations. Such data-based outcomes may offer help to HR managers in addressing the problems like attrition which intern may increase ROI. Thus, this paper underlines the emergence and relevance of analytics with special reference to human resource management domain.

Neerja Aswale, Kavya Mukul
A Study of Business Performance Management in Special Reference to Automobile Industry

In contemporary times, the automobile is one of the well-paid industries in the Indian market with an annual growth rate of 7.64% in the passenger-car market. The increasing disposable income of the people along with the ever-growing financial sector has led to this expendable growth. In accordance, there has been an increase in sales of passenger cars from 13.35% in –July 2018. This oligopoly market has fierce competition due to new entrants into the Indian markets. Thus, there emerges a need to grasp the knowledge to understand the ever-growing needs of the customers and dynamism in the technology-driven market. Every organization seeks to study consumer buying behavior and measure its business performance by analyzing customer perception toward the product. The understanding of customer’s perception is an ongoing process to survive the cut throat competition. Taking this into consideration, the study provides insights about several attributes which drive a consumer behavior toward buying a product of a brand. The study has used theories and exploratory research design along with analytical tools to identify the major attributes of consumer buying behavior toward sedan cars within Delhi/NCR among high-end consumers. This knowledge helps the car manufacturers in market segmentation which enables the organization to plan the market strategies toward consumer retention and product upgradation.

Gurinder Singh, Smiti Kashyap, Kanika Singh Tomar, Vikas Garg
Secure Online Voting System Using Biometric and Blockchain

Elections play an important role is democracy. If the election process is not transparent, secure and tamper proof then the reliability and authenticity of whole process is at stake. In this paper, we have discussed online voting system which fulfills all the above system requirements. We have addressed the issue of user authentication through iris recognition. We have used One Time Password (OTP) to have additional security check. We have also taken care that one valid user should not cast multiple votes. Use of Blockchain is the another security measure implemented in order to provide decentralized, tamper proof storage of data related to users biometric, personal details and votes casted by them. Thus we are not only focusing on user authenticity but also data security is also taken into consideration. The performance of the system has been tested for users from different age group and different background and its inference is presented.

Dipti Pawade, Avani Sakhapara, Aishwarya Badgujar, Divya Adepu, Melvita Andrade
An Approach: Applicability of Existing Heterogeneous Multicore Real-Time Task Scheduling in Commercially Available Heterogeneous Multicore Systems

Interest in design and use of heterogeneous multicore architectures has been increased in recent years due to the fact that the energy optimization and parallelization in heterogeneous multicore architecture are better than that of homogeneous multicore architecture. In heterogeneous multicore architectures, cores have similar Instruction Set Architecture (ISA) but the characteristics of the cores are different with respect to power and performance. Hence, heterogeneous architecture provides new prospects for energy-efficient computation and parallelization. Heterogeneous systems, furnished with different types of cores provide the mechanism to take actions with respect to irregular communication patterns, energy efficiency, high parallelism, load balancing, and unexpected behaviors. However, designing such heterogeneous systems for the different platforms like cloud, Internet of Things (IoT), Smart Devices, and Embedded Systems is still challenging. This paper studies the commercially available heterogeneous multicore architectures and finds out an approach or method to apply the existing work on heterogeneous multicore real-time task scheduling model to commercially available heterogeneous multicore architecture to achieve the parallelism, load balancing, and maximum throughput. The paper shows that the approach can be applied very efficiently to some of the commercially available heterogeneous systems to establish a generic heterogeneous model for the platforms like cloud, Internet of Things (IoT), Smart Devices, Embedded Systems, and other application areas.

Kalyan Baital, Amlan Chakrabarti
Analyzing the Detectability of Harmful Postures for Patient with Hip Prosthesis Based on a Single Accelerometer in Mobile Phone

This research studies the use of a single accelerometer inside a smartphone as a sensor to detect those postures that may be risks for patients with hip surgery to dislocate their joints. Various postures were analyzed using Euclidean distances to determine the feasibility to detect eight postures that were harmful. With the mobile phone attached to the affected upper leg, it was found that there was one harmful posture that could not be detected due to its close similarity with a normal posture. Meanwhile, the other two harmful postures, although indistinguishable based on their measured data, were still detectable with the suitably selected threshold. The distance measure analysis is useful as an indicator as to which posture will be near to missing out in the detection process. This will form a guideline for further design of a practical and more robust detecting system.

Kitti Naonueng, Opas Chutatape, Rong Phoophuangpairoj
Software Development Process Evolution in Malaysian Companies

GSD is a phenomenon mainly associated with the outsourcing of software development projects to some offshore company. Reduction in software development cost increased productivity and advantage of multisite development with respect to time are the main benefits that software development companies (SDCs) get from GSD. Besides benefits, a number of challenges associated with GSD are also observed. Consequently, the traditional processes to develop software are evolving and being replaced with a new set of processes which are lightweight and outcome-based. The process evolution has been deeply investigated in the context of companies mostly in Europe, Australia, USA and mainly other countries in those regions. In this regard, limited research has been carried out on Malaysian companies. The present research investigates the process evolution phenomenon in Malaysian companies. The current software development processes and the reasons for the evolution of software processes in Malaysian software companies have been identified. A qualitative approach using structured interviews has been followed for the collection of data collection and its analysis. The findings explain that software processes in most of the Malaysia companies are increasingly evolving or have been evolved. The companies are overwhelmingly adopting agile methods because of their support to GSD. Some of the companies are using ad hoc approaches for software development. The size of the company and project has been found as one of the main factors behind using ad hoc approaches. Mainly the small and medium-size companies and projects are involved in this practice.

Rehan Akbar, Asif Riaz Khan, Kiran Adnan
Automated Scheduling of Hostel Room Allocation Using Genetic Algorithm

Due to the rapid growth of the student population in tertiary institutions in many developing countries, hostel space has become one of the most important resources in university. Therefore, the decision of student selection and hostel room allocation is indeed a critical issue for university administration. This paper proposes a hierarchical heuristics approach to cope with hostel room allocation problem. The proposed approach involves selecting eligible students using rank based selection method and allocating selected students to the most suitable hostel room possible via the implementation of a genetic algorithm (GA). We also have examined the effects of using different weight associated with constraints on the performance of the GA. Results obtained from the experiments illustrate the feasibility of the suggested approach in solving the hostel room allocation problem.

Rayner Alfred, Hin Fuk Yu
Evaluation of ASTER TIR Data-Based Lithological Indices in Parts of Madhya Pradesh and Chhattisgarh State, India

The present study was performed in some parts of Madhya Pradesh and Chhattisgarh State, India to compare the different quartz indices, feldspar indices and mafic indices according to Ninomiya (2005) and Guha (2016) using thermal infrared (TIR) bands (band 10, band 11, band 12, band 13, and band 14) of Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data for detecting quartz, feldspar and mafic minerals. Results showed that these indices are equally useful for delineating quartz, feldspar or mafic minerals. It was noticed from the correlation coefficients that Guha’s mafic index (GMI) and Ninomiya’s mafic index (NMI) presented almost the same result. Guha’s quartz index (GQI) was more powerful than Ninomiya’s quartz index (NQI) in identifying quartz content in alkali granites and this GQI was also comparable with the Rockwall and Hofstra’s quartz index (RHQI) in identifying quartz content in alkali granite.

Himanshu Govil, Subhanil Guha, Prabhat Diwan, Neetu Gill, Anindita Dey
Analyzing Linear Relationships of LST with NDVI and MNDISI Using Various Resolution Levels of Landsat 8 OLI and TIRS Data

The present study used the Normalized Difference Vegetation Index (NDVI) and the Modified Normalized Difference Impervious Surface Index (MNDISI) to determine the linear relationship between Land Surface Temperature (LST) distribution and these remote sensing indices under various spatial resolutions. Four multi-date Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) images of parts of Chhattisgarh State of India were used from four different seasons (spring, summer, autumn and winter). The results indicate that LST established moderate to strong negative correlations with NDVI and weak negative to moderate positive correlations with MNDISI at various spatial resolutions (30–960 m). Generally, the coarser resolutions (840–960 m) possess stronger correlation coefficient values due to more homogeneity. The autumn or post-monsoon image represents the strongest correlation for LST–NDVI and LST–MNDISI at any resolution levels. The image of winter season reveals the best predictability of LST distribution with the known NDVI and MNDISI values.

Himanshu Govil, Subhanil Guha, Prabhat Diwan, Neetu Gill, Anindita Dey
Automatic Robot Processing Using Speech Recognition System

Nowadays, speech recognition is becoming a more useful technology in computer applications. Many interactive speech-aware applications exist in the field. In order to use this kind of easy way of communication technique into the computer field, speech recognition technique has to be evolved. The computer has to be programmed to accept the voice input and then process it to provide the required output, using various speech recognition software. Speech recognition is the process of converting speech signal to a sequence of words using appropriate algorithm. This provides an alternative and efficient way for the people who are not well educated or not having sufficient computer knowledge to access the systems and where typing becomes difficult. This speech recognition technique also reduces the manpower to accept and process the commands. In our research work, we have to implement this speech recognition technique in customer care center, where many queries have to be processed every day. Some of the queries are repeated often and the responses also seem to be the same. In such cases, we have to propose a methodology to automate the query-processing activities using this speech recognition technique. The ways of how to automate the system and how to process the queries automatically are explained in our methodology with suitable algorithm.

S. Elavarasi, G. Suseendran
Banking and FinTech (Financial Technology) Embraced with IoT Device

In recent years the traditional financial industries have motivated for a new technology of financial technology (FinTech) clinch embraced with internet of things (IoT). The requirements of FinTech and IoT need to be integrated into new business environment. Several companies are affected because of the financial-level investments. So, there is a need to improve the next level of the business. FinTech can introduce a new service of tools and products for the emergent businesses through the internet of services which provide ideas linked in internet. Nowadays, increasing number of companies uses the IoT and creates new added values. The administrators of existing money-related organization in the direct society are dreadful by means of budgetary innovation. The social innovation is accomplished by new innovation. To make a powerful business plan and action, the FinTech and IoT are combined in order to create new innovative ideas based on the requirements.

G. Suseendran, E. Chandrasekaran, D. Akila, A. Sasi Kumar

Big Data Management

Frontmatter
GRNN++: A Parallel and Distributed Version of GRNN Under Apache Spark for Big Data Regression

Among the neural network architectures for prediction, multi-layer perceptron (MLP), radial basis function (RBF), wavelet neural network (WNN), general regression neural network (GRNN), and group method of data handling (GMDH) are popular. Out of these architectures, GRNN is preferable because it involves single-pass learning and produces reasonably good results. Although GRNN involves single-pass learning, it cannot handle big datasets because a pattern layer is required to store all the cluster centers after clustering all the samples. Therefore, this paper proposes a hybrid architecture, GRNN++, which makes GRNN scalable for big data by invoking a parallel distributed version of K-means++, namely, K-means||, in the pattern layer of GRNN. The whole architecture is implemented in the distributed parallel computational architecture of Apache Spark with HDFS. The performance of the GRNN++ was measured on gas sensor dataset which has 613 MB of data under a ten-fold cross-validation setup. The proposed GRNN++ produces very low mean squared error (MSE). It is worthwhile to mention that the primary motivation of this article is to present a distributed and parallel version of the traditional GRNN.

Sk. Kamaruddin, Vadlamani Ravi
An Entropy-Based Technique for Conferences Ranking

Ranking of conferences is carried out to guide the researchers so as to publish their work in top-level venues. Existing works are needed to be improved in regard to accuracy. In this paper, we have proposed a ranking technique of conference papers named as influence language model (InLM). It uses entropy to calculate score for the purpose of ranking. It identifies level of the paper as well as level of the conference based on entropy score. Experiments have been performed on bigger dataset. The results reflect better position in comparison to the existing systems.

Fiaz Majeed, Rana Azhar Ul Haq
MapReduce mRMR: Random Forests-Based Email Spam Classification in Distributed Environment

The furthermost standard message transfer system used on the internet for communication is email. These days spam is a serious concern that causes major problems in today’s internet. Spam emails are uninhibited messages that are sent to a large number of beneficiaries arbitrarily. Owing to an overgrowing rise in reputation, the number of unsolicited data has also increased promptly and has led to many security concerns. Although the sufficient number of spam filtering techniques exists, nowadays spammers start discovering innovative practices to escape data that are filtered using the spam filters. Spammers use this communication source for spreading the malware in the name of an executable file. These spam emails waste user’s system memory, computing power, and bandwidth of the network. Spam emails have been initiated to progressively damage the integrity of email and destroy the online experience. The research revealed that if the classification algorithms are used with feature selection then that will return the exact results than the standard classification. In this paper, feature selection is done through minimum redundancy and maximum relevance (mRMR) and the classification is done by means of Random Forests in the MapReduce environment. The performance is compared using various measures, namely sensitivity, correctness, and accuracy with the Random Forests in the distributed environment using Spambase dataset.

V. Sri Vinitha, D. Karthika Renuka
The Impact of Sustainable Development Report Disclosure on Tax Planning in Thailand

This paper aims to examine the impact of sustainable development report disclosure (hereafter called SDRD) on the tax planning (hereafter called TP) of listed companies, which exclude the financial sector, in the Stock Exchange of Thailand. The data of this paper is based only on the year 2016 and the sample size consists of 337 companies from seven industries. The questionnaires from Global Report Initiative are used for evaluation of SDRD. The TP is also measured by the ratio of total tax expenses to total assets. Overall, this paper finds that the SDRD had a statistically negative effect on the TP. This indicates that companies with good SDRD practices could have a higher level of TP. Regarding control variables, financial leverage and capital intensity had a statistically positive effect on the TP, while profitability and family control had a statistically negative effect on the TP. This paper further divides the sample into family and non-family companies to examine whether there is any different effect of SDRD on the TP. The results further indicated that the relationship between SDRD and TP was significantly negative for the family companies. The relationship was, thus, weak and insignificant for the non-family companies. The results are useful to market regulators since they could make some decisions about adjusting some rules and regulations or give some incentives to encourage the Thai listed companies to perform better sustainable development practices.

Sathaya Thanjunpong, Thatphong Awirothananon
Clustering and Labeling Auction Fraud Data

Alzahrani, Ahmad Sadaoui, SamiraAlthough shill bidding is a common fraud in online auctions, it is however very tough to detect because there is no obvious evidence of it happening. There are limited studies on SB classification because training data are difficult to produce. In this study, we build a high-quality labeled shill bidding dataset based on recently scraped auctions from eBay. Labeling shill biding instances with multidimensional features is a tedious task but critical for developing efficient classification models. For this purpose, we introduce a new approach to effectively label shill bidding data with the help of the robust hierarchical clustering technique CURE. As illustrated in the experiments, our approach returns remarkable results.

Ahmad Alzahrani, Samira Sadaoui
Big Data Security Challenges and Preventive Solutions

Big data has opened the possibility of making great advancements in many scientific disciplines and has become a very interesting topic in academic world and in industry. It has also given contributions to innovation, improvements in productivity and competitiveness. However, at present, there are various security risks involved in the process of collection, storage and use. The leakage of privacy caused by big data poses serious problems for the users; also the incorrect or false big data may lead to wrong or invalid analysis of results. The presented work analyzes the technical challenges of implementing big data security and privacy protection, and describes some key solutions to address the issues related with big data security and privacy.

Nirmal Kumar Gupta, Mukesh Kumar Rohil
Role and Challenges of Unstructured Big Data in Healthcare

Unprecedented growth in the volume of unstructured healthcare data has immense potential in valuable insight extraction, improved healthcare services, quality patient care, and secure data management. However, technological advancements are required to achieve the potential benefits from unstructured data in healthcare according to the growth rate. The heterogeneity, diversity of sources, quality of data and various representations of unstructured data in healthcare increases the number of challenges as compared to structured data. This systematic review of the literature identifies the challenges and problems of data-driven healthcare due to the unstructured nature of data. The systematic review was carried out using five major scientific databases: ACM, Springer, ScienceDirect, PubMed, and IEEE Xplore. The inclusion of articles in review at the initial stage was based on English language and publication date from 2010 to 2018. A total of 103 articles were selected according to the inclusion criteria. Based on the review, various types of healthcare unstructured data have been discussed from different domains of healthcare. Also, potential challenges associated with unstructured big data have been identified in healthcare for future research directions in the technological advancement of healthcare services and quality patient care.

Kiran Adnan, Rehan Akbar, Siak Wang Khor, Adnan Bin Amanat Ali
Zip Zap Data—A Framework for ‘Personal Data Preservation’

In the era of Mobile devices, a huge volume of mobile applications produces vast and variety of data in their own formats. There is every chance that most of these data will be lost or gone forever. Let us explain how, technically any digital data becomes irretrievable if there is a change of data format or a change in application interpreting it or a change in platform feature used by the application and your app did not change to accommodate it. In simple terms, if you try to upgrade your mobile—there are a lot of possibilities that data from your previous mobile could be lost forever. Reason for loss could be your newer device comes with some obsolete platform features and hence some of your applications no longer run in the new device as developer decided to discontinue that app. Even more pressing issue is there is no way to transfer application data from old to a new device and this is very common with mobile apps. The root cause for this is the lack of backup and retrieval mechanisms in devices and applications. We studied various approaches and research works related to this problem and proposed ‘Zip Zap Data’, a framework for effective backup and recovery of personal digital data.

K. Arunkumar, A. Devendran
A Systematic Mapping Study of Cloud Large-Scale Foundation—Big Data, IoT, and Real-Time Analytics

Cloud computing is a unique concept which makes analysis and data easy to manipulate using large-scale infrastructure available to Cloud service providers. However, it is sometimes rigorous to determine a topic for research in terms of Cloud. A systematic map allows the categorization of study in a particular field using an exclusive scheme enabling the identification of gaps for further research. In addition, a systematic mapping study can provide insight into the level of the research that is being conducted in any area of interest. The results generated from such a study are presented using a map. The method utilized in this study involved analysis using three categories which are research, topic, and contribution facets. Topics were obtained from the primary studies, while the research type such as evaluation and the contribution type such as tool were utilized in the analysis. The objective of this paper was to achieve a systematic mapping study of the Cloud large-scale foundation. This provided an insight into the frequency of work which has been carried out in this area of study. The results indicated that the highest publications were on IoT as it relates to model with 12.26%; there were more publications on data analytics as is relates to metric with 2.83%, more articles on big data in terms of tool, with 11.32%, method with 9.43% and more research carried out on data management in terms of process with 6.6%. This outcome will be valuable to the Cloud research community, service providers, and users alike.

Isaac Odun-Ayo, Rowland Goddy-Worlu, Temidayo Abayomi-Zannu, Emanuel Grant
Studies on Radar Imageries of Thundercloud by Image Processing Technique

Severe atmospheric event can cause huge damage to civilization. Severe thunderstorm is one of those weather events. Analysis of cloud imageries can be used to forecast severe thunderstorm. Convective clouds are one of the main reasons for the formation of severe thunderstorm. Analysis of such cloud imageries by image processing can be used to predict severe thunderstorm. Analysis of RGB values of pixel of cloud imageries can be used to show the formation of severe thunderstorm. Histogram analysis of such cloud imageries can also be used to predict severe thunderstorm. In this study analysis of RGB values of pixels and histograms of cloud imageries has been used to now cast severe thunderstorm with a lead time of 6 to 8 h. This lead time is necessary to save life and property from huge damages.

Sonia Bhattacharya, Himadri Bhattacharyya Chakrabarty

Artificial Intelligence and Data Analysis

Frontmatter
PURAN: Word Prediction System for Punjabi Language News

This paper presents an outline of the PURAN: A state-of-the-art word prediction system for Punjabi language news. Word prediction systems are used to increase the user text composition rate while typing the text. Brief background of the various approaches utilized in the development of word prediction systems, while discussing the various factors affecting the development of such systems is provided. This paper also elaborates the word prediction system architecture in detail. The system performance was tested on Keystroke saving, Hit ratio, Average rank and Average keystrokes benchmark metrics. The paper demonstrates that the PURAN has achieved highest Hit ratio in Regional news genre followed by National news genre by making lowest average keystrokes in the said categories of news. The results show that system has achieved 88.38% Average Hit ratio with 51.42% Average keystroke saving for N = 10.

Gurjot Singh Mahi, Amandeep Verma
Implementation of hDE-HTS Optimized T2FPID Controller in Solar-Thermal System

In this paper, a fair approach is interpreted to validate the novelty of Type-2 Fuzzy PID (T2FPID) controller over Type-1 or conventional Fuzzy PID (FPID) and PI controller as secondary frequency controller and Heat Transfer Search (HTS) algorithm is adopted to extract the optimum gains of the controllers. T2FPID controller has a beautiful property to handle large uncertainties of the system with extra degree of freedom. T2FPID controller is implemented in a two area interconnected thermal-PV power system to enhance the system performance. ITAE is adopted as objective function of the system to lessen the undershoot, overshoot, and settling time of frequency and tie-line power deviation. A novel hDE-HTS algorithm is adopted to enhance the system performance by searching the relevant pair of gains of controller. This analysis is executed by implementing a step signal (load disturbance) of magnitude 0.1 in area-2 to study the transiency of the system. The novelty of this work is to implement hDE-HTS optimized T2FPID controller to enhance the Solar-thermal system responses (Frequency and tie-line power deviations).

Binod Shaw, Jyoti Ranjan Nayak, Rajkumar Sahu
Design of Sigma-Delta Converter Using 65 nm CMOS Technology for Nerves Organization in Brain Machine Interface

In this paper, an overview of the present related works in the field of neuroscience is determined. The parts of the neural interface using sigma-delta converter are examined the overall ADC is driven with low voltage to improve the control utilization in the nerves organization. In this work, a basic parts of nervous system is demonstrated which is conquer to solve the problem by delivering power and transmitting data in a minimized manner. For these the above signal is first transmitted to the brain using the help of electrode into the central nervous system of the brain where the signal is diagnosed using brain computer interface by analyzing the data from analog-digital converter. Sigma-delta converter is used for visualizing low frequency signal. Major advantage of this converter is that firstly, clocking circuit need not be design and secondly, it provides good accuracy. Actually, there is no role of digital to analog converter (D/A) connected wirelessly in proposed design. A new topology based on A/D converter, which plays a wide role to minimum supply voltage, and an inactive integrator to decrease control utilization is exhibited, for empowering an in-channel advanced converter plot in a large-scale neural recording implant.

Anil Kumar Sahu, G. R. Sinha, Sapna Soni
Performance Comparison of Machine Learning Techniques for Epilepsy Classification and Detection in EEG Signal

Epilepsy is a neurological affliction that in impact around 1% of humankind. Around 10% of the United States populace involvement with minimum a solitary convulsion in their activity. Epilepsy has recognized respectively tendency of the cerebrum outcomes unforeseen blasts of weird electrical action which disturbs the typical working of the mind. Since spasms by and large happen once in a while and are unforeseeable, seizure identification frameworks are proposed for seizure discovery amid long haul electroencephalography (EEG). In this exploration, we utilize DWT for highlight extraction and do correlation for all kind of Machine learning order like SVM, Nearest Neighbor Classifiers, Logistic relapse, Ensemble classifiers and so on. In this examination classification accuracy of Fine Gaussian SVM recorded as 100% and it has better as compare to other existing machine learning approaches.

Rekh Ram Janghel, Archana Verma, Yogesh Kumar Rathore
Novel Approach for Plant Disease Detection Based on Textural Feature Analysis

The image processing is the technique which can propose the information stored in the form of pixels. The plant disease detection is the technique which can detect the disease from the leaf. The plant disease detection algorithms have various steps like preprocessing, feature extraction, segmentation, and classification. The KNN classifier technique is applied which can classify input data into certain classes. The performance of KNN classifier is compared with the existing techniques and it is analyzed that KNN classifier has high accuracy, less fault detection as compared to other techniques. This paper presents methods that use digital image processing techniques to detect, quantify, and classify plant diseases from digital images in the visible spectrum. In plant leaf classification leaf is classified based on its different morphological features. Some of the classification techniques used are neural network, genetic algorithm, support vector machine, and principal component analysis. In this paper results are compared between KNN classifier and SVM classifier.

Varinderjit Kaur, Ashish Oberoi
Novel Approach for Brain Tumor Detection Based on Naïve Bayes Classification

The brain tumor detection is the approach which can detect the tumor portion from the MRI image. To detect tumor from the image various techniques has been proposed in the previous times. The technique which is adapted in research work is based upon morphological scanning, clustering, and Naïve Bayes classification. The morphological scanning will scan the input image and clustering will cluster similar and dissimilar patches from image then Naïve Bayes classifier spot the tumor portion from magnetic resonance imaging. The advance algorithm is implemented in MATLAB and results are analyzed in terms of PSNR, MSE accuracy, and fault detection and also calculate overlapping area with dice coef. The proposed method has been tested on data set with more than 25 slide scanned images. This proposed method achieved accuracy with 86% best cell detection.

Gurkarandesh Kaur, Ashish Oberoi
Automatic Classification of Carnatic Music Instruments Using MFCC and LPC

With a large collection of digital music in recent days, the challenge is to organize and access the music efficiently. Research in the field of Music Information Retrieval (MIR) focuses on these challenges. In this paper, we develop a system which automatically identifies the instrument in a given Carnatic music on ten different types of instruments. We extract the well-known features namely, MFCC and LPC, and analyze the capability of these features in distinguishing different instruments. Then, we apply, the classification techniques like, Artificial Neural Network, Support Vector Machine, and Bayesian classifiers on those features. We compare the performances of those algorithms along with different features for Carnatic music instruments identification.

Surendra Shetty, Sarika Hegde
Semiautomated Ontology Learning to Provide Domain-Specific Knowledge Search in Marathi Language

In this research work, our goal is to build a self-sustainable, reproducible, and extensive domain-specific ontology for the purposes of creating a knowledge search engine. We have used online data as the primary information store using which we construct ontology by identifying concepts (nodes) and relationships between concepts. The project encompasses preestablished ideas gathered from successful NLP trials and presents a new variation to the task of ontology creation. The system, for which the ontology is being created, is a knowledge search engine in Marathi. This aims at building semiautomated ontology whose target demographic is primary school children and the selected domain is science domain. This project proposes a method to build semiautomated ontology. We use a combination of natural language processing method and machine learning method to automate the ontology learning task. Automatically learned ontology is further modified by language and domain experts to enrich the contents of ontology. Unlike, standard search engines, our knowledge search engine attempts to provide learned resources directly to the user rather than website links. This approach enables the user to directly get information without having to spend time on browsing indexed links.

Neelam Chandolikar, Pushkar Joglekar, Shivjeet Bhosale, Dipali Peddawad, Rajesh Jalnekar, Swati Shilaskar
Identifying Influential Users on Social Network: An Insight

The advancement in the speed of the internet connection on handheld devices has led to an increase in the usage of social media. This drew the attention of advertisers to use social media as a platform to promote their products thus leading to an increase in the sales of their product, increasing the brand awareness. To increase the rate of information dissemination within a short period of time, influential users on social media were targeted, who would act as the word-of-mouth advertisers of the product. However, there are various parameters on which the influence of a user has to be determined. The parameters can be (1) the connectivity of the user in the network (2) knowledge/interest of the user on a particular topic/product/content (3) activity of the user on the social media. This survey focuses on the various methods and models for identifying influential nodes and also the effect of compliance, where a user falsely agrees to the content of another influential user by retweeting, just to gain status or reputation and thus increasing his influential score. Thus, the list of influential nodes of a social network can be faked upon, due to this issue.

Ragini Krishna, C. M. Prashanth
Factex: A Practical Approach to Crime Detection

The crime on roads is a major problem faced today by all the modern cities. Road Transport is the most common escape route for many criminals. Thefts and many other crimes remain unregistered and unsolved due to lack of evidence. Effective tracking of vehicles and criminals is still a big problem and involves plenty of resources. To evade such a condition, we have proposed a machine learning-based practical crime detection system using the text and face recognition techniques. Such systems will be proved useful in parking lots, toll stations, airports, border crossings, etc. In the proposed system, the text recognition involves extracting the characters present in the Indian number plates and the predicted output will be compared with the registered vehicle database. Simultaneously, Face recognition feature constitutes identifying criminal faces based on certain face regions and then mapping the respected coordinates with the criminal database. The proposed system presented in this research paper targets to deliver improvised outcomes considering the time constraints and accuracy with more than 85% successful recognitions in normal working conditions with the goal to accomplish the successful detection of crime using machine learning algorithms such as KNN, SVM, and face detection classifiers to present a practical real time detection.

Rachna Jain, Anand Nayyar, Shivam Bachhety
Analysis of Classification Algorithms for Breast Cancer Prediction

According to global statistics, breast cancer is the second of all the fatal diseases that cause death. It will cause an adverse effect when left unnoticed for a long time. However, its early diagnosis provides significant treatment, thus improving the prognosis and the chance of survival. Therefore, accurate classification of the benign tumor is necessary in order to improve the living of the people. Thus, precision in the diagnosis of breast cancer has been a significant topic of research. Even though several new methodologies and techniques are proposed machine learning algorithms and artificial intelligence concepts lead to accurate diagnosis, consequently improving the survival rate of women. The major intent of this research work is to summarize various researches done on predicting breast cancer and classifying them using data mining techniques.

S. P. Rajamohana, K. Umamaheswari, K. Karunya, R. Deepika
Real-Time Footfall Prediction Using Weather Data: A Case on Retail Analytics

Be it a retailer, producer, or supplier, the weather has a substantial effect on each one of them. Climate variability and weather patterns have become critical success factors in retail these days. As a matter of fact, weather forecasting has become a $3 billion business now. One of the main reason behind this surge is the capability of the forecasters to sell weather-related information to businesses who then strategize their various decisions regarding inventory, marketing, advertising, etc. accordingly. Hence only those retailers who stay “ahead of the game” will be able to enjoy huge sales while others who do not would face the consequences. Various studies regarding change in consumer behavior occurring due to the change in weather conditions have shown that even a degree change in temperature affects the store’s traffic and reflect the growing importance of predictive analytics in this domain. However, these studies incorporate only the historical weather statistics into account. In this paper, we will propose our methodology for footfall analytics to see how the changes in weather conditions will impact the retail store’s traffic and thereby retailing value chain, using real-time weather forecasts and footfall data. This analysis provides a platform for retailers to make evidence-driven decisions and strategize their business plan which would help them to deepen the customer involvement and to get efficiency in the planning process.

Garima Makkar
Normal Pressure Hydrocephalus Detection Using Active Contour Coupled Ensemble Based Classifier

The Brain plays an imperative role in the life of human being as it manages the communication between sensory organs and muscles. Consequently, any disease related to brain should be detected at an early stage. Abundant accumulation of cerebrospinal fluid in the ventricle results to a brain disorder termed as normal pressure hydrocephalus (NPH). The current study aims to segment the ventricular part from CT brain scans and then perform classification to differentiate between the normal brain and affected brain having NPH. In the proposed method, firstly few preprocessing steps have been carried out to enhance the quality of the input CT brain image and ventricle region is cropped out. Then active contour model is employed to perform segmentation of the ventricle. Features are extracted from the segmented region and Ensemble classifier is used to classify CT brain scan into two classes namely, normal and NPH. More than hundreds of CT brain scans were analyzed during this study; area of ventricle has been used as a measure of feature extraction. Experimental results disclosed a significant improvement in case of ensemble classifier in comparison to Support Vector Machine in terms of its performance.

Pallavi Saha, Sankhadeep Chatterjee, Santanu Roy, Soumya Sen
Question–Answer System on Episodic Data Using Recurrent Neural Networks (RNN)

Data comprehension is one of the key applications of question-answer systems. This involves a closed-domain answering system where a system can answer questions based on the given data. Previously people have used methods such as part of speech tagging and named entity recognition for such problems but those methods have struggled to produce accurate results since they have no information retention mechanisms. Deep learning and specifically recurrent neural networks based methods such as long short-term memory have been shown to be successful in creating accurate answering systems. This paper focuses on episodic memory where certain facts are aggregated in the form of a story and a question is asked related to a certain object in the story and a single fact present is given as answer. The paper compares the performance of these algorithms on benchmark dataset and provides guidelines on parameter tuning to obtain maximum accuracy. High accuracy (80% and above) was achieved on three tasks out of four.

Vineet Yadav, Vishnu Bharadwaj, Alok Bhatt, Ayush Rawal
Convoluted Cosmos: Classifying Galaxy Images Using Deep Learning

Misra, Diganta Mohanty, Sachi Nandan Agarwal, Mohit Gupta, Suneet K.In this paper, a deep learning-based approach has been developed to classify the images of galaxies into three major categories, namely, elliptical, spiral, and irregular. The classifier successfully classified the images with an accuracy of 97.3958%, which outperformed conventional classifiers like Support Vector Machine and Naive Bayes. The convolutional neural network architecture involves one input convolution layer having 16 filters, followed by 4 hidden layers, 1 penultimate dense layer, and an output Softmax layer. The model was trained on 4614 images for 200 epochs using NVIDIA-DGX-1 Tesla-V100 Supercomputer machine and was subsequently tested on new images to evaluate its robustness and accuracy.

Diganta Misra, Sachi Nandan Mohanty, Mohit Agarwal, Suneet K. Gupta

Advances in Network Technologies

Frontmatter
Energy-Based Improved MPR Selection in OLSR Routing Protocol

Wireless Ad hoc networks are consisting of wireless nodes that communicate over wireless medium without any centralized controller, fixed infrastructure, base station, or access point. The networks should be established in a distributed and decentralized way. Performance of mobile Ad hoc network depends on the routing scheme chosen. Extensive research has been taken place in recent years to suggest many proactive and reactive protocols to make them energy efficient. In this work, table-driven routing protocol, i.e., optimized link-state routing protocol (OLSR) is tried to make more energy efficient which also helps in prolonging the network lifetime. OLSR is a proactive routing protocol in Mobile Ad hoc Networks (MANETS) which is driven by hop-by-hop routing. The conventional OLSR is hybrid multipath routing, in which link-state information is forwarded only by Multi-Point Relays (MPRs) selected among one-hop and two-hop neighbor sets of host. In this work, a novel mechanism is introduced to select MPR among nodes neighbor set to make it more energy efficient by considering willingness of node. Proposed energy-aware MPR selection in MDOLSR is compared with conventional OLSR. Extensive simulations were performed using NS-2 simulator, and simulation results show improved network parameters such as higher throughput, more Packet Delivery Ratio (PDR) and lesser end-to-end delay as simulation time progresses.

Rachna Jain, Indu Kashyap
A Novel Approach for Better QoS in Cognitive Radio Ad Hoc Networks Using Cat Optimization

Cognitive Radio is a Wi-Fi verbal exchange methodology that allows the user to engage except having a fixed preassigned radio spectrum. Cognitive Radio Networks (CRNs) are having the routing hassle that is one of the serious constraints. Ad hoc networks are non-centralized Wi-Fi networks that can be constructed and there is no need for any preexisting infrastructure for these networks. Here every point can work as a router. In this paper, the authors have explained the Cognitive Radio Networks (CRN) that are obtaining so a whole lot of recognition where the principal focus is on the dynamic undertaking of channels to wireless devices. In this paper, cognitive radio networks are primarily focused. Nowadays, almost all the networks rely on fixed allocated networks in an approved or unapproved frequency group. In this paper literature evaluates associated to CRN and an optimization algorithm to enhance the overall performance of TE under CRN has been discussed. Swarm intelligence technique is used in the paper. Swarm approach is clearly the combination of the decentralized attribute to gain excellent viable solutions. The motivation regularly creates from nature, more often than non natural outlines. One of the effective approachs known as Cat swarm has been used to acquire high price of accuracy and much low error rates which improves the lifespan of the network. The results are carried out by the use of CSO (Cat Swarm Optimization) algorithm and parameters like energy consumption, congestion, overhead consumption, and number of routing rules are used to analyze the overall performance of the algorithm.

Lolita Singh, Nitul Dutta
(T-ToCODE): A Framework for Trendy Topic Detection and Community Detection for Information Diffusion in Social Network

The increased use of social network generates a huge amount of data. Extracting useful information from this huge data available is the need of today. Study and analysis of this data generated provide insight into the behavior of the customers or users and thus will be beneficial to increase the sales of products or understand customers. To achieve the same, we propose a novel framework which will extract trendy topics, identify communities related to these trendy, topics, and also identify influential or seed nodes in communities. The framework intends to find the list of topics which are popular, second, find trend-driven communities, and from these trend-driven communities find nodes which act as seed nodes and thus dominate the spread of information in the community. Analysis of real-world data is done and results are compared with baseline approaches.

Reena Pagare, Akhil Khare, Shankar Chaudhary
ns-3 Implementation of Network Mobility Basic Support (NEMO-BS) Protocol for Intelligent Transportation Systems

In an Intelligent Transportation System for a Smart City, seamless connectivity is essential for each user during mobility for efficient data communication. For a group of mobile users in a vehicle (bus/train/flight), due to high mobility, implementing a protocol in order to manage handoffs smoothly is a challenging task. Network Mobility Basic Support (NEMO-BS) protocol was proposed to comply with this requirement. It is an extended version of Mobile IPv6 (MIPv6). But, the MIPv6 implementation in ns-3, which is the most widely used open-source simulator, is still not extended so far, to support network mobility. In this work, we have implemented the functionality of the NEMO-BS protocol in ns-3.25 by modifying the existing MIPv6 module to enrich the ns-3 library.

Prasanta Mandal, Manoj Kumar Rana, Punyasha Chatterjee, Arpita Debnath
Modified DFA Minimization with Artificial Bee Colony Optimization in Vehicular Routing Problem with Time Windows

A NP-hard problem, vehicular routing is a combinatorial optimization problem. Vehicular routing problem with time windows indicates vehicular routing with specified start and end time. There will be “n” number of vehicles starting from the depot to cater to the needs of “m” customers. In this paper, Gehring and Homberger benchmark problems are considered wherein the size of customers is taken to be 1000. Artificial Bee Colony Optimization algorithm is executed on these 60 datasets and the number of vehicles along with total distance covered is recorded. The modified version of Deterministic Finite Automata is applied along with the Artificial Bee Colony Optimization and the results produce 25.55% efficient routes and 15.42% efficient distance compared to simple Artificial Bee Colony Optimization algorithm.

G. Niranjani, K. Umamaheswari
Coverage-Aware Recharge Scheduling Scheme for Wireless Charging Vehicles in the Wireless Rechargeable Sensor Networks

Recent advancement in the wireless power transfer technology has motivated the development of a wireless rechargeable sensor network (WRSN). In WRSNs, the formation of an optimal recharging schedule for each wireless charger vehicle is a well known NP-complete problem. To determine the optimal recharging schedule for each wireless charger vehicle, this paper presents a coverage-aware recharge scheduling scheme (CRS) where ACO-based metaheuristic algorithm is employed. In order to provide fast recharging in WRSN, the proposed scheme employs multiple wireless charger vehicles to perform the charging task. Performance analysis of the proposed scheme confirms its superiority in terms of charging latency.

Govind P. Gupta, Vrajesh Kumar Chawra
A Transition Model from Web of Things to Speech of Intelligent Things in a Smart Education System

Several terms have been used to describe Internet of Things; Web of Things (WoT) is a term which can be used interchangeability and it is referred to as the capability of devices to interconnect to the World Wide Web and sharing the information and data to one another. WoT has been mentioned in the literature to improve interconnection between devices at all times. In WoT, two different modes of communication which are generally mentioned in previous studies include person-to-thing (or thing-to-person) and thing-to-thing. This paper presents an architecture for transiting from WoT to speech-enabled WoT known as Speech of Intelligent Things (SoIT). The system employs a combination of technologies such as system design, server-side scripting, speech-based system tools, and data management in developing the SoIT prototype system as a third mode of communication. This paper illustrates a scenario whereby remote monitoring and controlling of WoT devices within the university campus might be difficult to manage by only using the modes discussed in the literature. An evolution of WoT to SoIT was realized using speech technology to provide a prototype system. Technical implications involve using a telephone by connecting an object telephone number (OTN) and dial WoT objects and establish a control mechanism. The research limitation is mainly the cost of dialing an OTN number. The contribution of this paper is to favor and encourage the use of speech technology to enhance the convenience of communication between WoT devices within the school campus.

Ambrose A. Azeta, Victor I. Azeta, Sanjay Misra, M. Ananya
Intrusion Detection and Prevention Systems: An Updated Review

The evolution of Information Technology (IT), cutting across several divides in our daily endeavors allows us to interact with all forms of data at different OSI model layers from application to physical. These data are susceptible to intrusion, aimed at compromising its integrity; thus, the need to protect these data, maintain its integrity, confidentiality, and availability cannot be overemphasized. Intrusion Detection and Prevention System (IDPS) is a device or software application designed to monitor a network or system. It detects vulnerabilities, reports malicious activities, and enacts preventive measures to keep up with the advancement of computer-related crimes using several response techniques. This paper presents an updated review on IDPSs given the fact that the most recent review found on the subject was done in 2016. It will also discuss the use of IDPSs to identify vulnerabilities in various channels through which data is accessed on a network or system and prevention mechanisms applied to mitigate against intrusion.

Nureni Ayofe Azeez, Taiwo Mayowa Bada, Sanjay Misra, Adewole Adewumi, Charles Van der Vyver, Ravin Ahuja
Simulation-Based Performance Analysis of Location-Based Opportunistic Routing Protocols in Underwater Sensor Networks Having Communication Voids

Recently, Underwater Wireless Sensor Networks (UWSNs) have emerged as a prominent research area in the networking domain due to their wide range of applications in submarine tracking, disaster detection, oceanographic data collection, pollution detection, and underwater surveillance. With its unique characteristics like continuous movement of sensor nodes, limitations in bandwidth and high utilization of energy, efficient routing and data transfer in UWSNs have remained a challenging task for researchers. Almost all the protocols proposed for terrestrial sensor networks are inefficient and do not perform well in an underwater environment. Recently Location-Based Opportunistic Routing Protocols have been observed to perform well in UWSN environments. But it is also observed that these protocols suffer from performance degradation in UWSN networks with communication voids. The objective of this research paper is to discuss the working of major Location-Based Opportunistic Routing Protocols in UWSNs with communication voids and to highlight their issues and drawbacks. We analyzed the Quality of Service parameters, packet delivery ratio, end-to-nd delay, throughput, and energy efficiency of two major Location-Based Opportunistic Routing Protocols, i.e., Vector-Based Forwarding (VBF) and Hop-by-Hop VBF (HH-VBF) in UWSNs with communication voids using NS-2 simulator with Aqua-Sim extension. Simulation results state that both VBF and HH-VBF protocols suffered from performance degradations in UWSNs with communication voids. In addition to this, the paper also highlights open issues for UWSN to assist researchers in designing efficient routing protocols for UWSNs having multiple communication voids.

Sonali John, Varun G. Menon, Anand Nayyar
A Hybrid Optimization Algorithm for Pathfinding in Grid Environment

Grid computing has been highly effective in the area of life sciences, financial analysis, research collaboration, and engineering. This paper is a study of existing algorithms like Swarm Intelligence (SI) algorithms such as Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC–PSO), and Parallel Particle Swarm Optimization (PPSO) to opt for the optimal path in a grid computing environment. These algorithms were used to solve the complex optimization problems in finding the path between source node to destination node effectively. Nature computing techniques based on the study of the collective behavior of ants, particle swarms, and bees are used to find the optimal path, improve the optimization methods and scalability in a set of representative problems. The hybridization of a grid computing environment and nature-inspired computing algorithms such as ACO, PSO, ABC–PSO, and PPSO has resulted in a class of solutions that differ in structure and design from the peer-to-peer network algorithms and the evaluated results showed the effectiveness of the pathfinding problem. ACO is implemented on a dynamic grid computing environment to demonstrate scalability and a solution for pathfinding. A class of four algorithms is used to find an optimal path and improve the optimization methods and shorten the computational time in a grid computing environment.

B. Booba, A. Prema, R. Renugadevi
Dynamic Hashtag Interactions and Recommendations: An Implementation Using Apache Spark Streaming and GraphX

Hashtag, started with Twitter is a keyword with prefix “#” and now being used mostly for all communication on social media. It has been identified as very powerful and effective in organizing communications according to the topic and trend. Hashtag can further help on various analysis, as it links users with their topic of interests. Hashtag aids in building communities of similar interests. With hashtags, we can follow current trend and interest on twitter which can help us in analyzing multiple factors, e.g., sensitivity of the ongoing trend, its spread, people getting affected, its effect on business and so on. Traditionally available approaches help us in analyzing batch data and finding interests and trends on it. Now with the advancements in the field of technology helps us in analyzing a large amount of online data within seconds. In this paper, we will be exploring dynamic hashtag interactions to find correlations among them and propose a methodology which can successfully find relevant hashtags based on the interest in focus. We will propose our methodology of analyzing and exploring tweets in real time with the extent of converting information; we are getting from twitter to knowledge.

Sonam Sharma
Backmatter
Metadata
Title
Data Management, Analytics and Innovation
Editors
Prof. Neha Sharma
Dr. Amlan Chakrabarti
Prof. Valentina Emilia Balas
Copyright Year
2020
Publisher
Springer Singapore
Electronic ISBN
978-981-329-949-8
Print ISBN
978-981-329-948-1
DOI
https://doi.org/10.1007/978-981-32-9949-8