Skip to main content
Top

2021 | Book

Distributed Computing and Internet Technology

17th International Conference, ICDCIT 2021, Bhubaneswar, India, January 7–10, 2021, Proceedings

insite
SEARCH

About this book

This book constitutes the proceedings of the 17th International Conference on Distributed Computing and Internet Technology, ICDCIT 2020, held in Bhubaneswar, India, in January 2021.

The 13 full papers presented together with 4 short papers were carefully reviewed and selected from 99 submissions. The papers were organized in topical sections named: invited talks, cloud computing and networks, distributed algorithms, concurrency and parallelism, graph algorithms and security, social networks and machine learning, and short papers.

Table of Contents

Frontmatter

Invited Talks

Frontmatter
The Bloom Clock for Causality Testing
Abstract
Testing for causality between events in distributed executions is a fundamental problem. Vector clocks solve this problem but do not scale well. The probabilistic Bloom clock can determine causality between events with lower space, time, and message-space overhead than vector clock; however, predictions suffer from false positives. We give the protocol for the Bloom clock based on Counting Bloom filters and study its properties including the probabilities of a positive outcome and a false positive. We show the results of extensive experiments to determine how these above probabilities vary as a function of the Bloom timestamps of the two events being tested, and to determine the accuracy, precision, and false positive rate of a slice of the execution containing events in the temporal proximity of each other. Based on these experiments, we make recommendations for the setting of the Bloom clock parameters. We postulate the causality spread hypothesis from the application’s perspective to indicate whether Bloom clocks will be suitable for correct predictions with high confidence. The Bloom clock design can serve as a viable space-, time-, and message-space-efficient alternative to vector clocks if false positives can be tolerated by an application.
Anshuman Misra, Ajay D. Kshemkalyani
Model Development in the Tool USE: Explorative, Consolidating and Analytic Steps for UML and OCL Models
Abstract
This contribution concentrates on the development process for descriptive and prescriptive UML and OCL models. We have decided to concentrate on three not necessarily disjoint techniques that we have labeled explorative, consolidating and analytic. We assume an imaginary, prototypical development process in that (1) informal ideas and requirements are first realized by exploring initial formal descriptions through interaction with a modeling tool, (2) stated ideas are consolidated with more detailed descriptions for structural and behavioral specifications, and (3) achieved descriptions are analyzed with respect to questions about stakeholder expectations. The contribution uses a running example for demonstration purposes.
Martin Gogolla
ReLink: Open Information Extraction by Linking Phrases and Its Applications
Abstract
Recently, many Open IE systems have been developed based on using deep linguistic features such as dependency-parse features to overcome the limitations presented in older Open IE systems which use only shallow information like part-of-speech or chunking. Even though these newer systems have some clear advantages in their extractions, they also possess several issues which do not exist in old systems. In this paper, we analyze the outputs from several popular Open IE systems to find out their strength and weaknesses. Then we introduce ReLink, a novel Open IE system for extracting binary relations from open-domain text. Its working model is based on identifying correct phrases and linking them in the most proper way to reflect their relationship in a sentence. After establishing connections, it can easily extract relations by using several pre-defined patterns. Despite using only shallow linguistic features for extraction, it does not have the same weakness that existed in older systems, and it can also avoid many similar issues arising in recent Open IE systems. Our experiments show that ReLink achieves larger Area Under Precision-Recall Curve compared with ReVerb and Ollie, two well-known Open IE systems.
Xuan-Chien Tran, Le-Minh Nguyen

Cloud Computing and Networks

Frontmatter
Energy-Efficient Scheduling of Deadline-Sensitive and Budget-Constrained Workflows in the Cloud
Abstract
Due to the rapid advancement of Cloud computing, more and more users are running their scientific and business workflow applications in the Cloud. The energy consumption of these workflows is high, which negatively affects the environment and also increases the operational costs of the Cloud providers. Moreover, most of the workflows are associated with budget constraints and deadlines prescribed by Cloud users. Thus, one of the main challenges of workflow scheduling is to make it energy-efficient for Cloud providers. At the same time, it should prevent budget and deadline violations for Cloud users. To address these issues, we consider a heterogeneous Cloud environment and propose an energy-efficient scheduling algorithm for deadline-sensitive workflows with budget constraints. Our algorithm ensures that the workflow is scheduled within the budget while reducing energy consumption and deadline violation. It utilizes Dynamic Voltage and Frequency Scaling (DVFS) to adjust the voltage and frequency of the virtual machines (VMs) executing tasks of the workflow. These adjustments help to achieve significant energy savings. Extensive simulation using real-world workflows and comparison with some state-of-art approaches validate the effectiveness of our proposed algorithm.
Anurina Tarafdar, Kamalesh Karmakar, Sunirmal Khatua, Rajib K. Das
An Efficient Renewable Energy-Based Scheduling Algorithm for Cloud Computing
Abstract
The global growth of cloud computing services is witnessing a continuous surge, starting from storing data to sharing information with others. It makes cloud service providers (CSPs) efficiently utilize the existing resources of datacenters to increase adaptability and minimize the unexpected expansion of datacenters. These datacenters consume enormous amounts of energy generated using fossil fuels (i.e., non-renewable energy (NRE) sources), and emit a substantial amount of carbon footprint and heat. It drastically impacts the environment. As a result, CSPs are pledged to decarbonize the datacenters by adopting renewable energy (RE) sources, such as solar, wind, hydro and biomass. However, these CSPs have not completely ditched fossil fuels as RE sources are subjected to inconsistent atmospheric conditions. Recent studies have suggested using both NRE and RE sources by the CSPs to meet user requirements. However, these studies have not considered flexible duration, nodes and utilization of the user requests (URs) with respect to datacenters. Therefore, we consider these URs’ properties and propose a RE-based scheduling algorithm (RESA) to efficiently assign the URs to the datacenters. The proposed algorithm determines both the earliest completion time and energy cost, and takes their linear combination to decide a suitable datacenter for the URs. We conduct extensive simulations by taking 1000 to 16000 URs and 20 to 60 datacenters. Our simulation results are compared with other algorithms, namely round-robin (RR) and random, which show that RESA is able to reduce the overall completion time (i.e., makespan (M)), energy consumption (EC), overall cost (OC) and the number of used RE (|URE|) resources.
Sanjib Kumar Nayak, Sanjaya Kumar Panda, Satyabrata Das, Sohan Kumar Pande
A Revenue-Based Service Management Algorithm for Vehicular Cloud Computing
Abstract
Vehicular cloud computing (VCC) is an emerging research area among business and academic communities due to its dynamic computing capacity, on-road assistance, infotainment services, emergency and traffic services. It can mitigate the hindrance faced in the existing infrastructure, relying on the cellular networks, using roadside units (RSUs). Moreover, the existing cellular networks cannot provide better quality services due to the influx of vehicles. In the VCC, RSUs can prefetch the content and data required by the vehicles, and provide them in terms of services when vehicles are residing within the communication range of RSUs. However, RSUs suffer from their limited communication range and data rate. Therefore, it is quite challenging for the RSUs to select suitable vehicles to provide services, such that its revenue can be maximized. In this paper, we propose a revenue-based service management (RBSM) algorithm to tackle the above-discussed challenges. RBSM is a two-phase algorithm that finds data rate zones of the vehicles and selects a suitable vehicle at each time slot to maximize the total revenue of the RSUs, the total download by the vehicles and the total number of completed requests. We assess the performance of RBSM and compare it with an existing algorithm, namely RSU resource scheduling (RRS), by considering various traffic scenarios. The comparison results show that RBSM performs 87%, 90% and 170% better than RRS in terms of total revenue, download and number of completed requests.
Sohan Kumar Pande, Sanjaya Kumar Panda, Satyabrata Das
Interference Reduction in Directional Wireless Networks
Abstract
In a wireless network using directional transmitters, a typical problem is to schedule a set of directional links to cover all the receivers in a region, such that an adequate data rate and coverage are maintained while minimizing interference. We can model the coverage area of a directional transmitter as an unit triangle and the receiver as a point in the plane. Motivated by this, we first consider a minimum ply covering (MPC) problem. We propose a 2-approximation algorithm for the MPC problem in \(O((opt+n)m^{14opt+1}(\log opt))\) time, where m is the number of transmitters and n is the number of receivers given in the plane, and \(opt\) is the maximum number of triangles, in the optimal solution, covering a point in the plane. We also show that the MPC problem is NP-hard, and is not \((2-\epsilon )\)-approximable for any \(\epsilon >0\) unless P = NP. We also study channel allocation in directional wireless networks by posing it as a colorable covering problem, namely, 3-colorable unit triangle cover (3CUTC). We propose a simple 4-approximation algorithm in \(O(m^{30}n^2)\) time, for this problem.
Manjanna Basappa, Sudeepta Mishra

Distributed Algorithms, Concurrency and Parallelism

Frontmatter
Automated Deadlock Detection for Large Java Libraries
Abstract
Locating deadlock opportunities in large Java libraries is a subject of much research as the Java Execution Environment (JVM /JRE) does not provide means to predict or prevent deadlocks. Researchers have used static and dynamic approaches to analyze the problem.
Static approaches: With very large libraries, this analysis face typical accuracy/doability problem. If they employ a detailed modelling of the library, then the size of the analysis grows too large. Instead, if their model is coarse grained, then the results have too many false cases. Since they do not generate deadlocking test cases, manually creating deadlocking code based on the predictions is impractical for large libraries.
Dynamic approaches: Such analysis produces concrete results in the form of actual test cases to demonstrate the reachability of the identified deadlock. Unfortunately, for large libraries, generating the seed test execution paths covering all possible classes, to trigger the dynamic analysis becomes impractical.
In this work we combine a static approach (Stalemate) and a dynamic approach (Omen) to detect deadlocks in large Java libraries. We first run ‘Stalemate’ to generate a list of potential deadlocking classes. We feed this as input test case to Omen. In case of deadlock, details are logged for subsequent reproduction. This process is automated without the need for manual intervention.
We subjected the entire JRE v1.7.0\(\_\)79 libraries (rt.jar) to our implementation of the above approach and successfully detected 113 deadlocks. We reported a few of them to Oracle as defects. They were accepted as bugs.
R. Rajesh Kumar, Vivek Shanbhag, K. V. Dinesha
DNet: An Efficient Privacy-Preserving Distributed Learning Framework for Healthcare Systems
Abstract
Medical data held in silos by institutions, makes it challenging to predict new trends and gain insights, as, sharing individual data leaks user privacy and is restricted by law. Meanwhile, the Federated Learning framework [11] would solve this problem by facilitating on-device training while preserving privacy. However, the presence of a central server has its inherent problems, including a single point of failure and trust. Moreover, data may be prone to inference attacks. This paper presents a Distributed Net algorithm called DNet to address these issues posing its own set of challenges in terms of high communication latency, performance, and efficiency. Four different networks have been discussed and compared for computation, latency, and precision. Empirical analysis has been performed over Chest X-ray Images and COVID-19 dataset. The theoretical analysis proves our claim that the algorithm has a lower communication latency and provides an upper bound.
Parth Parag Kulkarni, Harsh Kasyap, Somanath Tripathy
Memory Optimized Dynamic Matrix Chain Multiplication Using Shared Memory in GPU
Abstract
Number of multiplications needed for Matrix Chain Multiplication of \( n \) matrices depends not only on the dimensions but also on the order to multiply the chain. The problem is to find the optimal order of multiplication. Dynamic programming takes \( O\left( {n^{3} } \right) \) time, along with \( O\left( {n^{2} } \right) \) space in memory for solving this problem. Now-a-days, Graphics Processing Unit (GPU) is very useful to the developers for parallel programming using CUDA computing architecture. The main contribution of this paper is to recommend a new memory optimized technique to solve the Matrix Chain Multiplication problem in parallel using GPU, mapping diagonals of calculation tables \( m[][] \) and \( s[][] \) into a single combined calculation table of size \( O\left( {n^{2} } \right) \) for better memory coalescing in the device. Besides optimizing the memory requirement, a versatile technique of utilizing Shared Memory in Blocks of threads is suggested to minimize time for accessing dimensions of matrices in GPU. Our experiment shows best ever Speedup as compared to sequential CPU implementation, run on large problem size.
Girish Biswas, Nandini Mukherjee

Graph Algorithms and Security

Frontmatter
Parameterized Complexity of Defensive and Offensive Alliances in Graphs
Abstract
In this paper we study the problem of finding small defensive and offensive alliances in a simple graph. Given a graph \(G=(V,E)\) and a subset \(S\subseteq V(G)\), we denote by \(d_S(v)\) the degree of a vertex \(v\in V\) in G[S], the subgraph of G induced by S. A non-empty set \(S\subseteq V\) is a defensive alliance in \(G=(V,E)\) if \(d_S(v)+1\ge d_{S^c}(v)\) for all \(v\in S\). A non-empty set \(S\subseteq V\) is an offensive alliance in G if \(d_S(v)\ge d_{S^c}(v)+1\) for all \(v\in N(S)\). It is known that the problems of finding small defensive and offensive alliances are NP-complete. We enhance our understanding of the problems from the viewpoint of parameterized complexity by showing that (1) the problems are FPT when parameterized by neighbourhood diversity of the input graph, (2) the offensive alliance problem is FPT when parameterized by domino treewidth of the input graph, and (3) the defensive and offensive alliance problems are polynomial time solvable for graphs with bounded treewidth.
Ajinkya Gaikwad, Soumen Maity, Shuvam Kant Tripathi
A Reconstructive Model for Identifying the Global Spread in a Pandemic
Abstract
With existing tracing mechanisms, we can quickly identify potentially infected people with a virus by choosing everyone who has come in contact with an infected person. In the presence of abundant resources, that is the most sure-fire way to contain the viral spread. In the case of a new virus, the methods for testing and resources may not be readily available in ample quantity. We propose a method to determine the highly susceptible persons such that under limited testing capacity, we can identify the spread of the virus in a community. We determine highly suspected persons (represented as nodes in a graph) by choosing paths between the infected nodes in an underlying contact graph (acquired from location data). We vary parameters such as the infection multiplier, false positive ratio, and false negative ratio. We show the relationship between the parameters with the test positivity ratio (the number of infected nodes to the number of suspected nodes). We observe that our algorithm is robust enough to handle different infection multipliers and false results while producing suspected nodes. We show that the suspected nodes identified by the algorithm result in a high test positivity ratio compared to the real world. Based on the availability of the test kits, we can run our algorithm several times to get more suspected nodes. We also show that our algorithm takes a finite number of iterations to determine all the suspected nodes.
Debasish Pattanayak, Dibakar Saha, Debarati Mitra, Partha Sarathi Mandal
Cost Effective Method for Ransomware Detection: An Ensemble Approach
Abstract
In recent years, ransomware has emerged as a new malware epidemic that creates havoc on the Internet. It infiltrates a victim system or network and encrypts all personal files or the whole system using a variety of encryption techniques. Such techniques prevent users from accessing files or the system until the required amount of ransom is paid. In this paper, we introduce an optimal, yet effective classification scheme, called ERAND (Ensemble RANsomware Defense), to defend against ransomware. ERAND operates on an optimal feature space to yield the best possible accuracy for the ransomware class as a whole as well as for each variant of the family.
Parthajit Borah, Dhruba K. Bhattacharyya, J. K. Kalita

Social Networks and Machine Learning

Frontmatter
Exploring Alzheimer’s Disease Network Using Social Network Analysis
Abstract
Alzheimer’s is a degenerative disease with changes occurring in different regions of the brain at different rates resulting in progressive deterioration. A lot of functional brain connectivity is altered, the process itself is insufficiently understood. In this work, an attempt is made to understand the progressive deterioration of the brain by locating the regions that show significant changes in the connectivity in the five lobes of the brain at different stages of the disease. Methods available in social network analysis like community and maximal clique analysis along with degree distributions, and centrality measures have been used to observe the network evolution of these regions at different stages. Networks of four diagnostic stages i.e., Normal, Early MCI (eMCI), Late MCI (lMCI), and Alzheimer’s Disease (AD), taken from ADNI (Alzheimer’s Disease Neuroimaging Initiative) database, are used for this study. Nine Regions of Interest (ROIs) from the five lobes are identified and a higher degree of change is observed in the connections of regions from the temporal and frontal lobes. There is a splurge of new connections in the eMCI stage for all regions except for those from the frontal lobe. We also observed more rearrangement among the left hemisphere nodes as compared to the right hemisphere nodes. There is an overall loss of edges between the normal and AD stages. This confirms that the study is able to identify the regions that are affected by the progression of the disease.
Swati Katiyar, T. Sobha Rani, S. Durga Bhavani
Stroke Prediction Using Machine Learning in a Distributed Environment
Abstract
As with our changing lifestyles, certain biological dimensions of human lives are changing, making people more vulnerable towards stroke problem. Stroke is a medical condition in which parts of the brain do not get blood supply and a person attains stroke condition which can be fatal at times. As these stroke cases are increasing at an alarming rate, there is a need to analyze about factors affecting the growth rate of these cases. There is a need to design an approach to predict whether a person will be affected by stroke or not. This paper analyse different machine learning algorithms for better prediction of stroke problem. The algorithms used for analysis include Naive Bayes, Logistic Regression, Decision Tree, Random Forest and Gradient Boosting. We use dataset, which consists of 11 features such as age, gender, BMI (body mass index), etc. The analysis of these features is done using univariate and multivariate plots to observe the correlation between these different features. The analysis also shows how some features such as age, gender, smoking status are important factors and some feature like residence are of less importance. The proposed work is implemented using Apache Spark, which is a distributed general-purpose cluster-computing framework. The Receiver Operating Curve (ROC) of each algorithm is compared and it shows that the Gradient Boosting algorithm gives the best results with the ROC area score of 0.90. After fine-tuning, certain parameters in Gradient Boosting algorithm like optimization of the learning rate, depth of the tree, the number of trees and minimum sample split. The obtained ROC area score is 0.94. Other performance parameters such as Accuracy, Precision, Recall and F1 score values before fine-tuning are 0.867, 0.8673, 0.866 and 0.8659 respectively and after fine-tuning the values are 0.9449, 0.9453, 0.9449 and 0.9448 respectively.
Maihul Rajora, Mansi Rathod, Nenavath Srinivas Naik
Automated Diagnosis of Breast Cancer with RoI Detection Using YOLO and Heuristics
Abstract
Breast Cancer (specifically Ductal Carcinoma) is widely diagnosed by Fine Needle Aspiration Cytology (FNAC). Deep Learning techniques like Convolutional Neural Networks (CNNs) can automatically diagnose this condition by processing images captured from FNAC. However, CNNs are trained on manually sampled RoI (Region of Interest) patches or hand-crafted features. Using a Region Proposal Network (RPN) can automate RoI detection and save time and effort. In this study, we have proposed the use of the YOLOv3 network as an RPN and supplemented it with image-based heuristics for RoI patch detection and extraction from cytology images. The extracted patches were used to train 3 CNNs - VGG16, ResNet-50 and Inception-v3 for classification. YOLOv3 identified 164 RoIs in 26 out of 27 images and we achieved 96.6%, 98.8% and 98.9% classification accuracies with VGG16, ResNet-50 and Inception-v3 respectively.
Ananya Bal, Meenakshi Das, Shashank Mouli Satapathy, Madhusmita Jena, Subha Kanta Das

Short Papers

Frontmatter
An Efficient Approach for Event Prediction Using Collaborative Distance Score of Communities
Abstract
An effective technique for prediction of events can capture the evolution of communities and help understand the collaborative trends in massive dataset applications. One major challenge is to find the derived features that can improve the accuracy of ML models in efficient prediction of events in such evolutionary patterns.
It is often observed that a group of researchers associate with another set of researchers having similar interests to pursue some common research goals. A study of such associations forms an essential basis to assess collaboration trends and to predict evolving topics of research. A hallmarked co-authorship dataset such as DBLP plays a vital role in identifying collaborative relationships among the researchers based on their academic interests.
The association between researchers can be calculated by computing their collaborative distance. Refined Classical Collaborative Distance (RCCD) proposed in this paper is an extension of existing Classical Collaborative distance (CCD).
This computed RCCD score is then considered as a derived feature along with other community features for effective prediction of events in DBLP dataset.
The experimental results show that the existing CCD method predicts events with an accuracy of 81.47%, whereas the accuracy of our proposed RCCD method improved to 84.27%. Thus, RCCD has been instrumental in enhancing the accuracy of ML models in the effective prediction of events such as trending research topics.
B. S. A. S. Rajita, Bipin Sai Narwa, Subhrakanta Panda
A Distributed System for Optimal Scale Feature Extraction and Semantic Classification of Large-Scale Airborne LiDAR Point Clouds
Abstract
Airborne LiDAR (Light Detection and Ranging) or aerial laser scanning (ALS) technology can capture large-scale point cloud data, which represents the topography of large regions. The raw point clouds need to be managed and processed at scale for exploration and contextual understanding of the topographical data. One of the key processing steps is feature extraction from pointwise local geometric descriptors for object-based classification. The state of the art involves finding an optimal scale for computing the descriptors, determined using descriptors across multiple scales, which becomes computationally intensive in the case of big data. Hence, we propose the use of a widely used big data analytics framework integration of Apache Spark and Cassandra, for extracting features at optimal scale, semantic classification using a random forest classifier, and interactive visualization. The visualization involves real-time updates to the selection of regions of interest, and display of feature vectors upon a change in the computation of descriptors. We show the efficacy of our proposed application through our results in the DALES aerial LiDAR point cloud.
Satendra Singh, Jaya Sreevalsan-Nair
Load Balancing Approach for a MapReduce Job Running on a Heterogeneous Hadoop Cluster
Abstract
Hadoop MapReduce has become the de-facto standard in today’s Big data world to process the more prominent data sets on a distributed cluster of commodity hardware. Today computing nodes in a commodity cluster do not have the same hardware configuration, which leads to heterogeneity. Heterogeneity has become common in the industry, research institutes, and academics. Our study shows that the current rules for calculating the required number of Reduce tasks (Reducers) for a MapReduce job are fallacious, leading to significant computing resources’ overutilization. It also degrades MapReduce job performance running on a heterogeneous Hadoop cluster. However, there is no definite answer to the question: What is the optimal number of Reduce tasks required for a MapReduce job to get Hadoop’s most accomplished performance in a heterogeneous cluster? We have proposed a new rule that decides the required number of reduce tasks for a MapReduce job running on a heterogeneous Hadoop cluster accurately. The proposed rule balances the load among the heterogeneous nodes in the Reduce phase of MapReduce. It also minimizes computing resources’ overutilization and improves the MapReduce job execution time by an average of 18% and 28% for TeraSort and PageRank applications running on a heterogeneous Hadoop cluster.
Kamalakant Laxman Bawankule, Rupesh Kumar Dewang, Anil Kumar Singh
Study the Significance of ML-ELM Using Combined PageRank and Content-Based Feature Selection
Abstract
Scalable big data analysis frameworks are of paramount importance in the modern web society, which is characterized by a huge number of resources, including electronic text documents. Hence, choosing an adequate subset of features that provide a complete representation of the document while discarding the irrelevant one is of utmost importance. Aiming in this direction, this paper studies the suitability and importance of a deep learning classifier called Multilayer ELM (ML-ELM) by proposing a combined PageRank and content-based feature selection (CPRCFS) technique on all the terms present in a given corpus. Top \(k\%\) terms are selected to generate a reduced feature vector which is then used to train different classifiers including ML-ELM. Experimental results show that the proposed feature selection technique is better or comparable with the baseline techniques and the performance of Multilayer ELM can outperform state-of-the-arts machine and deep learning classifiers.
Rajendra Kumar Roul, Jajati Keshari Sahoo
Backmatter
Metadata
Title
Distributed Computing and Internet Technology
Editors
Diganta Goswami
Truong Anh Hoang
Copyright Year
2021
Electronic ISBN
978-3-030-65621-8
Print ISBN
978-3-030-65620-1
DOI
https://doi.org/10.1007/978-3-030-65621-8

Premium Partner