Skip to main content

About this book

This book constitutes the refereed post-conference proceedings of the Third IFIP TC 12 International Conference on Computational Intelligence in Data Science, ICCIDS 2020, held in Chennai, India, in February 2020.

The 19 revised full papers and 8 revised short papers presented were carefully reviewed and selected from 94 submissions. The papers are organized in the following topical sections: computational intelligence for text analysis; computational intelligence for image and video analysis; and data science.

Table of Contents


Computational Intelligence for Text Analysis


Emotion Recognition in Sentences - A Recurrent Neural Network Approach

This paper presents a Long short-term memory (LSTM) model to automate the tagging of sentences with a dominant emotion. LSTM is a type of artificial recurrent neural network (RNN) architecture. A labeled corpus of news headlinesc [1] and textual descriptions [3] are used to train our model. The data set is annotated for a set of six basic emotions: Joy, Sadness, Fear, Anger, Disgust and Surprise and also takes into consideration a 7\(^\mathrm{th}\) emotion, Neutral, to adequately represent sentences with incongruous emotions. Our model takes into account that one sentence can represent a conjunct of emotions and resolves all such conflicts to bring out one dominant emotion that the sentence can be categorized into. Furthermore, this can be extended to categorize an entire paragraph into a particular emotion. Our model gives us an accuracy of 85.63% for the prediction of emotion when trained on the above mentioned data set and an accuracy of 91.6% for the prediction of degree of emotion for a sentence. Additionally, every sentence is associated with a degree of the dominant emotion. One can infer that a degree of emotion means the extent of the emphasis of an emotion. Although, more than one sentence conveys the same emotion, the amount of emphasis of the emotion itself can vary depending on the context. This feature of determining the emphasis of an emotion, that is, degree of an emotion, is also taken care of in our model.
N. S. Kumar, Mahathi Amencherla, Manu George Vimal

Context Aware Contrastive Opinion Summarization

Model-based approaches for context-sensitive contrastive summarization depend on hand-crafted features for producing a summary. Deriving these hand-crafted features using machine learning algorithms is computationally expensive. This paper presents a deep learning approach to provide an end-to-end solution for context-sensitive contrastive summarization. A hierarchical attention model referred to as Contextual Sentiment LSTM (CSLSTM) is proposed to automatically learn the representations of context, feature and opinion words present in review documents of each entity. The resultant document context vector is a high-level representation of the document. It is used as a feature for context-sensitive classification and summarization. Given a set of summaries from positive class and a negative class of two entities, the summaries which have high contrastive score are identified and presented as context-sensitive contrastive summaries. Experimental results on restaurant dataset show that the proposed model achieves better performance than the baseline models.
S. K. Lavanya, B. Parvathavarthini

Tamil Paraphrase Detection Using Encoder-Decoder Neural Networks

Detecting paraphrases in Indian languages require critical analysis on the lexical, syntactic and semantic features. Since the structure of Indian languages differ from the other languages like English, the usage of lexico-syntactic features vary between the Indian languages and plays a critical role in determining the performance of the system. Instead of using various lexico-syntactic similarity features, we aim to apply a complete end-to-end system using deep learning networks with no lexico-syntactic features. In this paper we exploited the encoder-decoder model of deep neural network to analyze the paraphrase sentences in Tamil language and to classify. In this encoder-decoder model, LSTM, GRU units and gNMT are used as layers along with attention mechanism. Using this end-to-end model, there is an increase in f1-measure by 0.5% for the subtask-1 when compared to the state-of-the-art systems. The system was trained and evaluated on DPIL@FIRE2016 Shared Task dataset. To our knowledge, ours is the first deep learning model which validates the training instances of both the subtask-1 and subtask-2 dataset of DPIL shared task.
B. Senthil Kumar, D. Thenmozhi, S. Kayalvizhi

Trustworthy User Recommendation Using Boosted Vector Similarity Measure

An online social network (OSN) is crowded with people and their huge number of post and hence filtering truthful content and/or filtering truthful content creator is a great challenge. The online recommender system helps to get such information from OSN and suggest the valuable item or user. But in reality people have more belief on recommendation from the people they trust than from untrusted sources. Getting recommendation from the trusted people derived from social network is called Trust-Enhanced Recommender System (TERS). A Trust-Boosted Recommender System (TBRS) is proposed in this paper to address the challenge in identifying trusted users from social network. The proposed recommender system is a fuzzy multi attribute recommender system using boosted vector similarity measure designed to predict trusted users from social networks with reduced error. Performance analysis of the proposed model in terms of accuracy measures such as precision@k and recall@k and error measures, namely, MAE, MSE and RMSE is discussed in this paper. The evaluation shows that the proposed system outperforms other recommender system with minimum MAE and RMSE.
Dhanalakshmi Teekaraman, Sendhilkumar Selvaraju, Mahalakshmi Guruvayur Suryanarayanan

Sensitive Keyword Extraction Based on Cyber Keywords and LDA in Twitter to Avoid Regrets

Twitter is the most popular social platform where common people reflect their personal, political and business views that obliquely build an active online repository. The data presented by users on social networking sites are usually composed of sensitive or private data that is highly potential for cyber threats. The most frequently presented sensitive private data is analyzed by collecting real-time tweets based on benchmarked cyber-keywords under personal, professional and health categories. This research work aims to generate a Topic Keyword Extractor by adapting the Automatic Acronym - Abbreviation Replacer which is specially developed for social media short texts. The feature space is modeled using the Latent Dirichlet Allocation technique to discover topics for each cyber-keyword. The user’s context and intentions are preserved by replacing the internet jargon and abbreviations. The originality of this research work lies in identifying sensitive keywords that reveal Tweeter’s Personally Identifiable Information through the novel Topic Keyword Extractor. The potential sensitive topics in which the social media users frequently exhibit personal information and unintended information disclosures are discovered for the benchmarked cyber-keywords by adapting the proposed qualitative topic-wise keyword distribution approach. This experiment analyzed cyber-keywords and the identified sensitive topic keywords as bi-grams to predict the most common sensitive information leaks happening in Twitter. The results showed that the most frequently discussed sensitive topic was ‘weight loss’ with the cyber-keyword ‘weight’ of the health tweet category.
R. Geetha, S. Karthika

Sentiment Analysis of Bengali Tweets Using Deep Learning

Sentiment analysis is the research area that deals with analysis of sentiments expressed in the social media texts written by the internet users. Sentiments of the users are expressed in various forms such as feelings, emotions and opinions. Tweet sentiment polarity detection is an important sentiment analysis task which is to classify an input tweet as one of three classes: positive, negative and neutral. In this study, we compare various deep learning methods that use LSTM, BILSTM and CNN for the sentiment polarity classification of Bengali tweets. We also present in this paper a comparative study on the Bengali tweet sentiment polarity classification performances of the traditional machine learning methods and the deep learning methods.
Kamal Sarkar

Social Media Veracity Detection System Using Calibrate Classifier

In the last decade, social media has grown extremely fast and captured tens of millions of users are online at any time. Social media is a powerful tool to share information in the form of articles, images, URLs and, videos online. Concurrently it also spreads the rumors. To fight against the rumors, media users need a verification tool to verify the fake post on Twitter. The main motivation of this research work is to find out which classification model helps to detecting the rumor messages. The proposed system adopts three feature extraction techniques namely Term Frequency-Inverse Document Frequency, Count-Vectorizer and Hashing-Vectorizer. The authors proposed a Calibrate Classifier model to detect the rumor messages in twitter and this model has been tested on real-time event#gaja tweets. The proposed calibrate model shows better results for rumor detection than the other ensemble models.
P. SuthanthiraDevi, S. Karthika

A Thesaurus Based Semantic Relation Extraction for Agricultural Corpora

Semantic relations exist two concepts present in the text. Semantic relation extraction becomes an essential part of building an efficient Natural Language Processing (NLP) applications such as Question Answering (QA) and Information Retrieval (IR) system. Automatic semantic relation extraction from text increases the efficiency of these systems by aiding in retrieving more accurate information to the user query. In this research work, we have proposed a framework that extracts agricultural entities and finds the semantic relation exist between entities. Entity extraction is done using a Parts Of Speech (POS) tagger, Word Suffixes and Thesaurus without using any of the external domain-specific knowledge bases, such as Ontology and WordNet. Semantic relation exists between entities are done by using Multinomial Naïve Bayes (MNB) classifier. This paper extracts two entities, namely disease and treatment and focuses on two semantic relations namely “Cure” and “Prevent”. The “Cure” semantic relation expresses the remedial measure for the diseases that prevail in the crops, and the “Prevent” semantic relation shows the precautionary measures that could prevent the crop from being affected. The proposed approach has been trained with 2281 sentences and tested against 553 sentences and then evaluated using standard metrics.
R. Srinivasan, C. N. Subalalitha

Irony Detection in Bengali Tweets: A New Dataset, Experimentation and Results

Irony detection is a difficult task because the intended meaning of a sentence differs from the literal meaning or sentiment of that sentence. Most existing work on this subject has focused on irony detection in the English language. Since no public dataset is available for this task in the Bengali domain, we have created a Bengali irony detection dataset that contains a total of 1500 labeled Bengali tweets. This paper presents the description of the Bengali irony detection dataset developed by us and reports some results obtained on our Bengali irony dataset using several widely used machine learning algorithms such as Naïve Bayes, Support Vector Machine, K-Nearest Neighbor and Random Forest.
Adhiraj Ghosh, Kamal Sarkar

Computational Intelligence for Image and Video Analysis


Bat Algorithm with CNN Parameter Tuning for Lung Nodule False Positive Reduction

Lung cancer, an uncontrolled development of abnormal cells in one or both lungs has been one of the primary causes of cancer related deaths worldwide. Detecting it in the earlier stage is the only solution to reduce lung cancer deaths. The most common tests to look for cancerous cells include X-ray, CT scan, Sputum cytology and biopsy test. CT scan is recognized as one of the effective tools in recognizing it in the earlier stage. Detecting the lung nodules (lesions) sometimes seems to be very difficult in Computer Aided Detection (CAD) systems. Because of the fact that the lung nodules have similar contrast with other structure, there might be a chance in generating numerous false positives. The performance of Convolutional Neural Network (CNN) mainly depends on the hyper parameters selected for a problem. The main motive of the proposed work is to use Bat algorithm to optimize the network hyper parameters such as number of filters in convolution layers, number of neurons and filter size in the CNN to enhance the network performance thereby eliminating the requirement of manual search for optimal hyper parameters. The methodology is validated using important performance validation metrics such as accuracy, sensitivity and specificity. The result shows that CNN in conjunction with Bat algorithm provides better results in the classification of nodules and non-nodules with minimal false positive rate.
R. R. Rajalaxmi, K. Sruthi, S. Santhoshkumar

A Secure Blind Watermarking Scheme Using Wavelets, Arnold Transform and QR Decomposition

In recent years the amount of digitally stored content available as images, videos, documents, etc., has increased exponentially. With the invention of public storages like clouds etc., security and privacy of digital data are of extreme importance. With the availability of powerful editing tools, modification of digital data is no longer a challenging task. Content modification can be done either with positive intentions like image and video enhancement or with malicious intentions like image, video morphing, video piracy, etc. To detect malicious activities, ownership of digital content needs to be established. One possible solution is to embed owner information during the content generation process. So, a secure watermarking (WMG) scheme is proposed using Wavelet transform, Arnold transforms (AT) and QR factorization in this article. The novelty of this technique is the unique way of generating WM (watermark) which makes the WMG secure. The technique is analyzed using the images given in datasets, signal and image processing institute (SIPI), break our watermarking system (BOWS), and Copydays. The experimental results of the proposed scheme are promising.
Ayesha Shaik

Role of Distance Measures in Approximate String Matching Algorithms for Face Recognition System

This paper is based on the recognition of faces using string matching. The approximate string matching is a method for finding an approximate match of a pattern within a string. Exact matching is impracticable for a larger amount of data as it involves more time. Those issues can be solved by finding an approximate match rather than an exact match. This paper aims to experiment with the performance of approximation string matching approaches using various distance measures such as Edit distance, Longest Common Subsequence (LCSS), Hamming distance, Jaro distance, and Jaro-Winkler distance. The algorithms generate a near-optimal solution to face recognition system with reduced computational complexity. This paper deals with the conversion of face images into strings, matching those image strings by using the approximation string matching algorithm that determines the distance and classifies a face image based on the minimum distance. Experiments have been performed with FEI and ORL face databases for the evaluation of approximation string matching algorithms and the results demonstrate the utility of distance measures for the face recognition system.
B. Krishnaveni, S. Sridhar

Detection of Human Faces in Video Sequences Using Mean of GLBP Signatures

Machine analysis of detection of the face is robust research topic in human-machine interaction today. The existing studies reveal that discovering the position and scale of the face region is difficult due to significant illumination variation, noise and appearance variation in unconstrained scenarios. We designed work is spontaneous and vigorous method to identify the location of face area using recently developed You Tube Video face database. Formulate the normalization technique in each frame. The frame is separated into overlapping regions. The Gabor signatures extracted on each region by Gabor filters with different scale and orientations. The Gabor signatures are averaged and then local binary pattern histogram signatures are extracted. The Gabor local binary pattern signatures are passed to Gentle Boost categorizer with the assistance of face and non-face signature of the gallery images for identifying the portion of the face region. Our experimental results on YouTube video face database exhibits promising results and demonstrate a significant performance improvement when compared to the existing techniques. Furthermore, our designed work is uncaring to head poses and sturdy to variations in illumination, appearance and noisy images.
S. Selvi, P. Ithaya Rani, S. Muhil Pradahnji

Malware Family Classification Model Using User Defined Features and Representation Learning

Malware is very dangerous for system and network user. Malware identification is essential tasks in effective detecting and preventing the computer system from being infected, protecting it from potential information loss and system compromise. Commonly, there are 25 malware families exists. Traditional malware detection and anti-virus systems fail to classify the new variants of unknown malware into their corresponding families. With development of malicious code engineering, it is possible to understand the malware variants and their features for new malware samples which carry variability and polymorphism. The detection methods can hardly detect such variants but it is significant in the cyber security field to analyze and detect large-scale malware samples more efficiently. Hence it is proposed to develop an accurate malware family classification model contemporary deep learning technique. In this paper, malware family recognition is formulated as multi classification task and appropriate solution is obtained using representation learning based on binary array of malware executable files. Six families of malware have been considered here for building the models. The feature dataset with 690 instances is applied to deep neural network to build the classifier. The experimental results, based on a dataset of 6 classes of malware families and 690 malware files trained model provides an accuracy of over 86.8% in discriminating from malware families. The techniques provide better results for classifying malware into families.
T. Gayathri, M. S. Vijaya

Fabric Defect Detection Using YOLOv2 and YOLO v3 Tiny

The paper aims to classify the defects in a fabric material using deep learning and neural network methodologies. For this paper, 6 classes of defects are considered, namely, Rust, Grease, Hole, Slough, Oil Stain, and, Broken Filament. This paper has implemented both the YOLOv2 model and the YOLOv3 Tiny model separately using the same fabric data set which was collected for this research, which consists of six types of defects, and uses the convolutional weights which were pre-trained on Imagenet dataset. Observed and documented the success rate of both the model in detecting the defects in the fabric material.
R. Sujee, D. Shanthosh, L. Sudharsun

Automatic Questionnaire and Interactive Session Generation from Videos

In this paper, we present a tool that interleaves lengthy lecture videos with questionnaires at optimal moments. This is done to keep students’ attention by making the video interactive. The student will be presented with MCQ type questions based on the topic covered so far in the video, at regular intervals. The questions are generated based on the transcript of the video lecture using machine learning and natural language processing techniques. In order to have continuity and proper flow of teaching, a LDA-based (Latent Dirichlet Allocation) model has been proposed to insert those generated questions at appropriate points called logical points.
V. C. Skanda, Rachana Jayaram, Viraj C. Bukitagar, N. S. Kumar

Effective Emotion Recognition from Partially Occluded Facial Images Using Deep Learning

Effective expression analysis hugely depends upon the accurate representation of facial features. Proper identification and tracking of different facial muscles irrespective of pose, face shape, illumination, and image resolution is very much essential for serving the purpose. However, extraction and analysis of facial and appearance based features fails with improper face alignment and occlusions. Few existing works on these problems mainly determine the facial regions which contribute towards discrimination of expressions based on the training data. However, in these approaches, the positions and sizes of the facial patches vary according to the training data which inherently makes it difficult to conceive a generic system to serve the purpose. This paper proposes a novel facial landmark detection technique as well as a salient patch based facial expression recognition framework based on ACNN with significant performance at different image resolutions.
Smitha Engoor, Sendhilkumar Selvaraju, Hepsibah Sharon Christopher, Mahalakshmi Guruvayur Suryanarayanan, Bhuvaneshwari Ranganathan

Driveable Area Detection Using Semantic Segmentation Deep Neural Network

Autonomous vehicles use road images to detect roads, identify lanes, objects around the vehicle and other important pieces of information. This information retrieved from the road data helps in making appropriate driving decisions for autonomous vehicles. Road segmentation is such a technique that segments the road from the image. Many deep learning networks developed for semantic segmentation can be fine-tuned for road segmentation. The paper presents details of the segmentation of the driveable area from the road image using a semantic segmentation network. The semantic segmentation network used segments road into the driveable and alternate area separately. Driveable area and alternately driveable area on a road are semantically different, but it is a difficult computer vision task to differentiate between them since they are similar in texture, color, and other important features. However, due to the development of advanced Deep Convolutional Neural Networks and road datasets, the differentiation was possible. A result achieved in detecting the driveable area using a semantic segmentation network, DeepLab, on the Berkley Deep Drive dataset is reported.
P. Subhasree, P. Karthikeyan, R. Senthilnathan

Data Science


WS-SM: Web Services - Secured Messaging Framework with Pluggable APIs

Dynamic composition of web services is important in B2B applications where user requirements and business policies change and new services get added to the service registry frequently. In a dynamic composition environment, ensuring the security of messages communicated among the web services becomes challenging since, several attacks are possible on SOAP messages in the public network due to their standardized interfaces. Most of the existing works on web services security provide solutions to ensure basic security features such as confidentiality, integrity, authentication, authorization, and non-repudiation. Few existing works that provide solutions such as schema validation and schema hardening for attacks on web services do not provide attack-specific solutions. The web services security standard and all the existing works have addressed only the security of messages between a client and a single web service but not the security for messages between two services which is quite challenging. Hence, a security framework for secured messaging among web services has been proposed to provide attack-specific solutions. Since new types of web service attacks are evolving over time, the proposed security solutions are implemented as APIs that are pluggable in any server where the web service is deployed. The proposed framework has been tested for compliance with WSI-BP to demonstrate its interoperability and subjected to vulnerability testing which proved its immunity to attacks. The stress testing results revealed that the throughput decreased only by 35% achieving a good trade-off between performance and security.
Kanchana Rajaram, Chitra Babu

Online Product Recommendation System Using Multi Scenario Demographic Hybrid (MDH) Approach

The recommendation system plays very important role in the ecommerce domain to recommend the relevant products/services based on end user preference or interest. This helps end users to easily make the buying decision from a vast range of product and brands are available in the market. A lot of research is done in recommendation system, aim to provide the relevant product to the end user by referring end user past purchase history, transaction details etc. In our Multi scenario demographic hybrid (MDH) approach, important demographic influence factors like the user age group and located area are considered. The products are also ranked with associated age group category. The experimental results of the proposed recommendation system have proven that it is better than the existing systems in terms of prediction accuracy of relevant products.
R. V. Karthik, Sannasi Ganapathy

Simulation of Path Planning Algorithms Using Commercially Available Road Datasets with Multi-modal Sensory Data

Road datasets for computer vision tasks involved in advanced driver assist systems and autonomous driving are publicly available for the technical community for the development of machine learning aided scene understanding using computer vision systems. All the perceived data from multiple sensors mounted on the vehicle must be fused to generate an accurate state of the vehicle and its surroundings. The paper presents details of the simulation implementation of local path planning for an autonomous vehicle based on multi-sensory information. The simulation is carried out with sensory inputs from RGB camera, LIDAR and GPS. The data is obtained from the KITTI dataset. A variant of the D-star algorithm is utilized to demonstrate global and local path-planning capabilities in the simulation environment.
R. Senthilnathan, Arjun Venugopal, K. S. Vishnu

Implementation of Blockchain-Based Blood Donation Framework

Existing blood management systems in India function as Information Management systems that lack dynamic updates of blood usage and detailed blood trail information, starting from donation to consumption. There exists no communication platform for surplus blood in one region to be requested from another region where blood is scarce, leading to wastage of blood. Lack of transparency and proper blood quality checks have led to several cases of blood infected with diseases such as HIV being used for transfusion. This paper aims at mitigating these issues using a blockchain-based blood management system. The issue of tracking the blood trail is modelled as a supply-chain management issue. The proposed system, implemented in the Hyperledger Fabric framework, brings more transparency to the blood donation process by tracking the blood trail and also helps to curb unwarranted wastage of blood by providing a unified platform for the exchange of blood and its derivatives between blood banks. For ease of use, a web application is also built for accessing the system.
Sivakamy Lakshminarayanan, P. N. Kumar, N. M. Dhanya

IoT Based Crop-Field Monitoring and Precise Irrigation System Using Crop Water Requirement

Existing practices of crop irrigation is manual and based on generic traditional recommendations. Crops when provided lesser water, shows reduced growth and reduced uptake of calcium. Excessive irrigation leads to root death and water wastage. Hence, irrigating crops with precise water becomes an important problem. Towards this objective, an IoT based crop field monitoring and precise irrigation system is proposed that monitors crop-field and computes precise crop water requirement based on its life cycle and climatic conditions. Using this computed crop water requirement, a pump motor is operated automatically whenever soil moisture decreases below permanent welting point. The motor is shut down once the required water is pumped out to crops. The proposed system is installed in a crop-field of brindle plant and the crop is irrigated for 6 months. It is observed that 53% of water has been saved from wastage.
Kanchana Rajaram, R. Sundareswaran

A Machine Learning Study of Comorbidity of Dyslexia and Attention Deficiency Hyperactivity Disorder

Neurodevelopmental disorders in children like dyslexia and ADHD must be diagnosed at earlier stages as the children need to be provided with necessary aid. Comorbidity of dyslexia and ADHD is very high. Children with comorbidity of dyslexia and ADHD face comparatively more difficulty than children with just one of the disorders. Since all the three, dyslexia, ADHD and comorbid cases share many similar characteristics, it is hard to distinguish between cases which have only dyslexia or ADHD and those which have both. Manual analysis to differentiate based on standard scores of the psycho analysis tests provided inconsistent results. In this paper, we have applied standard machine learning techniques Random Forest, Support Vector Machine and Multilayer Perceptron to the diagnosis test results to classify between ADHD and comorbid cases, and dyslexia and comorbid cases. Analysis using the different individual psycho analysis tests is also done. Application of machine learning techniques provides better classification than the manual analysis.
Junaita Davakumar, Arul Siromoney

Program Synthesis: Synthesizing Operators for Integer Manipulation

We describe a language to synthesize a linear sequence of arithmetic operations for integer manipulation. Given an input-output example, our language synthesizes a set of operators to be applied to the input integers to obtain the given output. The sequence is generated by using Microsoft Prose, a program synthesis framework and the Genetic Algorithm. Our approach generates a set of ranked solutions that can be made unique on additional input-output examples that are consistent.
Jayasurya Seenuvasan, Shalini Sai Prasad, N. S. Kumar

Location Based Recommender Systems (LBRS) – A Review

Recommender system has a vital role in everyday life with newer advancement. Location based recommender system is the current trend involved in mobile devices by providing the user with their timely needs in an effective and efficient manner. The services provided by the location based recommender system are Geo-tagged data based services containing the Global Positioning System and sensors incorporated to accumulate user information. Bayesian network model is widely used in geo-tagged based services to provide solution to the cold start problem. Point Location based services considers user check-in and auxiliary information to provide recommendation. Regional based recommendation can be considered for improving accuracy in this Point location based service. Trajectory based services uses the travel paths of the user and finds place of interest along with the similar user behaviours. Context based information can be incorporated with these services to provide better recommendation. Thus this article provides an overview of the Geo-tagged media based services and Point Location based services and discusses about the possible research issues and future work that can be implemented.
R. Sujithra @ Kanmani, B. Surendiran


Additional information

Premium Partner

    Image Credits