Skip to main content
Top

2022 | Book

Artificial Intelligence and Speech Technology

Third International Conference, AIST 2021, Delhi, India, November 12–13, 2021, Revised Selected Papers

insite
SEARCH

About this book

This volume constitutes selected papers presented at the Third International Conference on Artificial Intelligence and Speech Technology, AIST 2021, held in Delhi, India, in November 2021.
The 36 full papers and 18 short papers presented were thoroughly reviewed and selected from the 178 submissions. They provide a discussion on application of Artificial Intelligence tools in speech analysis, representation and models, spoken language recognition and understanding, affective speech recognition, interpretation and synthesis, speech interface design and human factors engineering, speech emotion recognition technologies, audio-visual speech processing and several others.

Table of Contents

Frontmatter

Speech and Natural language Processing

Frontmatter
A Critical Insight into Automatic Visual Speech Recognition System

This research paper investigated the robustness of the Automatic Visual Speech Recognition System (AVSR), for acoustic models that are based on GMM and DNNs. Most of the recent survey literature is surpassed in this article. Which shows how, over the last 30 years, analysis and product growth on AVSR robustness in noisy acoustic conditions has progressed? There are various categories of languages covered, with coverage, development processes, Corpus, and granularity varying. The key advantage of deep-learning tools, including a deeply convoluted neural network, a bi-directional long-term memory grid, a 3D convolutional neural network, and others, is that they are relatively easy to solve such problems, which are indissolubly linked to feature extraction and complex audiovisual fusion. Its objective is to act as an AVSR representative.

Kiran Suryavanshi, Suvarnsing Bhable, Charansing Kayte
Speaker Independent Accent Based Speech Recognition for Malayalam Isolated Words: An LSTM-RNN Approach

Automatic speech recognition (ASR) has been a very active area of research for the past few decades. Though there are great advancements in ASR in many languages accent-based speech recognition is an area that is yet to be explored in many languages. Speech recognition by humans is an intuitive process and so is a tough process to make the computers automatically recognize human speech. Although speech recognition has achieved promising achievements for many languages; speech recognition for the Malayalam language is still in infancy. The scarcity of the datasets makes it researchers difficult to do the experiments. Here in this paper, we have experimented with Long Short-Term Memory (LSTM) a Recurrent Neural Network (RNN), for recognizing the accent-based isolated words in Malayalam. The datasets we used here have been constructed manually under a natural recording environment. We used Mel Frequency Cepstral Coefficient (MFCC) methods to extract the features from the audio signals. LSTM with RNN is used to train and build the model since this technology significantly outperforms all other feed-forward deep neural networks and other statistical methodologies.

Rizwana Kallooravi Thandil, K. P. Mohamed Basheer
A Review on Speech Synthesis Based on Machine Learning

Recently, Speech synthesis is one of the growing techniques in the research domain that takes input as text and provides output as acoustical form. The speech synthesis system is more advantageous to physically impaired people. In execution process, there arise some complications by surrounding noises and communication style. To neglect such unnecessary noises various machine learning techniques are employed. In this paper, we described various techniques adopted to improve the naturalness and quality of synthesized speech. The main contribution of this paper is to elaborate and compare the characteristics of techniques utilized in speech synthesis for different languages. The techniques such as support vector machine, Artificial Neural Network, Gaussian mixture modeling, Generative adversarial network, Deep Neural Network and Hidden Markov Model are employed in this work to enhance the speech naturalness and quality of synthesized speech signals.

Ruchika Kumari, Amita Dev, Ashwni Kumar
Hindi Phoneme Recognition - A Review

A review for Hindi phoneme recognition is presented to address Hindi speech recognition. Different issues related to Hindi phonemes such as Hindi speech characteristics, features used in phoneme recognition, and classification method highlighted. Related work was also presented to highlight issues concerned with feature extraction, classification, and distinct features. Earlier reviews mostly addressed speech recognition technologies. This work is an early research study presented for Hindi phoneme recognition. A phoneme-based system is used to overcome the constraint of the requirement of large training samples for word-based models. Phoneme-based systems are widely used for large vocabulary speech recognition, different issues related to consonants and vowels were also included. The comparative analysis is presented for different feature extraction and classification techniques with a recognition score. The research helps by presenting issues related to phoneme recognition.

Shobha Bhatt, Amita Dev, Anurag Jain
Comparison of Modelling ASR System with Different Features Extraction Methods Using Sequential Model

Speech recognition refers to a device’s ability to respond to spoken instructions. Speech recognition facilitates hands-free use of various gadgets and appliances (a godsend for many incapacitated persons), as well as supplying input for automatic translation and ready-to-print dictation. Many industries, including healthcare, military telecommunications, and personal computing, use speech recognition programmes. In our paper, we are including the comparison between the different feature extraction methods (BFCC, GFCC, MFCC, MFCC Delta, MFCC Double Delta, LFCC and NGCC) using neural networks.

Aishwarya Suresh, Anushka Jain, Kriti Mathur, Pooja Gambhir
Latest Trends in Deep Learning for Automatic Speech Recognition System

In the field of Computer Learning and Intelligent Systems research, Deep Learning is one of the latest development projects. It’s also one of the trendiest areas of study right now. Computational vision and pattern recognition have benefited greatly from the dramatic advances made possible by deep learning techniques. New deep learning approaches are already being suggested, offering performance that outperforms current state-of-the-art methods and even surpasses them. There has been much significant advancement in this area in the last few years. Deep learning is developing at an accelerated rate, making it difficult for new investigators to keep pace of its many kinds. We will quickly cover current developments in Deep Learning in the last several years in this article.

Amritpreet Kaur, Rohit Sachdeva, Amitoj Singh
Deep Learning Approaches for Speech Analysis: A Critical Insight

The main objective of speaker recognition is to identify the voice of an authenticated and authorized individual by extracting features from their voices. The number of published techniques for speaker recognition algorithms is text-dependent. On the other hand, text-independent speech recognition appears to be more advantageous since the user can freely interact with the system. Several scholars have suggested a variety of strategies for detecting speakers, although these systems were difficult and inaccurate. Relying on WOA and Bi-LSTM, this research suggested a text-independent speaker identification algorithm. In presence of various degradation and voice effects, the sample signals were obtained from a available dataset. Following that, MFCC features are extracted from these signals, but only the most important characteristics are chosen from the available features by utilizing WOA to build a single feature set. The Bi-LSTM network receives this feature set and uses it for training and testing. In the MATLAB simulation software, the proposed model’s performance is assessed and compared to that of the standard model. Various dependent factors, like accuracy, sensitivity, specificity, precision, recall, and Fscore, were used to calculate the simulated outputs. The findings showed that the suggested model is more efficient and precise at recognizing speaker voices.

Alisha Goyal, Advikaa Kapil, Sparsh Sharma, Garima Jaiswal, Arun Sharma
Survey on Automatic Speech Recognition Systems for Indic Languages

For the past few decades, Automatic Speech Recognition (ASR) has gained a wide range of interest among researchers. From just identifying the digits for a single speaker to authenticating the speaker has a long history of improvisations and experiments. Human’s Speech Recognition has been fascinating problem amongst speech and natural language processing researchers. Speech is the utmost vital and indispensable way of transferring information amongst the human beings. Numerous research works have been equipped in the field of speech processing and recognition in the last few decades. Accordingly, a review of various speech recognition approaches and techniques suitable for text identification from speech is conversed in this survey. The chief inspiration of this review is to discover the prevailing speech recognition approaches and techniques in such a way that the researchers of this field can incorporate entirely the essential parameters in their speech recognition system which helps in overcoming the limitations of existing systems. In this review, various challenges involved in speech recognition process are discussed and what can be the future directives for the researchers of this field is also discussed. The typical speech recognition trials were considered to determine which metrics should be involved in the system and which can be disregarded.

Nandini Sethi, Amita Dev
Analytical Approach for Sentiment Analysis of Movie Reviews Using CNN and LSTM

With the rapid growth of technology and easier access to the internet, several forums like Twitter, Facebook, Instagram, etc., have come up, providing people with a space to express their opinions and reviews about anything and everything happening in the world. Movies are widely appreciated and criticized art forms. They are a significant source of entertainment and lead to web forums like IMDB and amazon reviews for users to give their feedback about the movies and web series. These reviews and feedback draw incredible consideration from scientists and researchers to capture the vital information from the data. Although this information is unstructured, it is very crucial. Deep learning and machine learning have grown as powerful tools examining the polarity of the sentiments communicated in the review, known as ‘opinion mining’ or ‘sentiment classification.’ Sentiment analysis has become the most dynamic exploration in NLP (natural language processing) as text frequently conveys rich semantics helpful for analyzing. With ongoing advancement in deep learning, the capacity to analyze this content has enhanced significantly. Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) is primarily implemented as powerful deep learning techniques in Natural Language Processing tasks. This study covers an exhaustive study of sentiment analysis of movie reviews using CNN and LSTM by elaborating the approaches, datasets, results, and limitations.

Arushi Garg, Soumya Vats, Garima Jaiswal, Arun Sharma
Analysis of Indian News with Corona Headlines Classification

With the advent of the world wide web, the world has seen an explosion in the amount of information that is available online. People stay informed about the national and international affairs through online news which is readily available and portable allowing ease of access. These news pieces tend to shape people’s thoughts and provoke emotions, which may be positive, neutral or negative, without them realizing their effect. The objective of this work is to create a hybrid model that can analyze the overall effect of digital news content in India. The hybrid approach of sentiment analysis encompasses lexicon and machine learning algorithms as well as a self-created scored corpus of corona related words to classify all sorts of headlines. The labelled dataset is used to train decision tree and random forest algorithms. They are evaluated based on their accuracy scores, classification reports and confusion matrices. The results prove that both the algorithms perform well on the dataset and that the Indian media highlighted neutral news the most. This finding can be very useful for the Indian news agencies since they can alter their reporting strategies to create an impact of their choice on the readers’ minds and thus, increase the readership.

Janhavi Jain, Debadrita Dey, Bhavika Kelkar, Khyati Ahlawat
Feature Extraction and Sentiment Analysis Using Machine Learning

The role of social networks has bought a tremendous change in the analysis of the opinions. Understanding people sentiments or opinion helps the business or organization to better understand their customers. There are several platforms where people can easily post their views about a service or products, these can be facebook, twitter e.t.c. Feature extraction or aspect extraction becomes important since one needs to know the qualities a product or a service have. In this research, we have analyzed hotel reviews by applying n-gram for feature. As the dataset is always noisy so basic preprocessing steps are applied before extraction. The features extracted are trained and tested by basic machine learning classifiers. Various machine learning algorithms like KNN, SVM, and random forest are used for the analysis of the performance. The evaluation measures are calculated at the end to validate the results. K-fold cross validation scheme is also applied on the dataset to improve the overall accuracy of the results.

Neha Vaish, Nidhi Goel, Gaurav Gupta
A Neural Network Based Approach for Text-Level Sentiment Analysis Using Sentiment Lexicons

There have been many discussions on forums, e-commerce sites, sites for reviewing products, social media which helps in exchanging opinions, thoughts through free expression of users. Internet as well as web 2.0 is overflowing with the data generated by users which provides a good source for various sentiments, reviews, and evaluations. Opinion mining more popularly known as sentiment analysis classifies the text document based on a positive or negative sentiment that it holds. This is an open research domain and this particular research paper puts forth a model called Artificial Neural Network Based Model i.e., ANNBM. The model is trained and tested through Information Gain as well as three other popular lexicons to extract the sentiments. It’s a new approach that best utilizes the ANNBM model and the subjectivity knowledge which is available in sentiment lexicons. Experiments were conducted on the mobile phone review as well as car review to derive that this approach was successful in finding best output for sentiment-based classification of text and simultaneously reduces dimensionality.

Gaurav Dubey, Pinki Sharma
Cross Linguistic Acoustic Phonetic Study of Punjabi and Indian English

Punjabi and English are the languages that does not belong to the similar family. For instance, English belongs to the West German languages while Punjabi is a part of the Indo Aryan family. Regional languages have an impact on English in this borrowing and code-mixing process because borrowed words go through a make-up caused by phonetic features of the regional languages. In India, English is the medium of media, science, and technology & its influence on the native languages of the country is significant. This study shows how the regional dialects of the country have a significant influence on the way English is pronounced. It is an effort to define the phonemic changes in Punjabi and Indian English. The purpose of this study was to see if the sound pattern of Indian English varies depending on the speakers’ native languages or if it is the same irrespective of the speakers’ native languages and also to identify words that differ in pronunciation from Standard English and are clearly marked by an influence of the first language of the native Punjabi speakers of India.

Amita Dev, Shweta A. Bansal, Shyam S. Agrawal
Prashn: University Voice Assistant

University websites are the best source to get information about a university. But each university has a different methodology for the implementation of its website and there is no common layout for finding specific information. This can be difficult for users, especially potential students who are trying to compare different universities to decide where to get admitted in. Natural Language Human-Computer Interaction (HCI) is trending these days due to its ease of use. Thus, implementing a chatbot on university websites can reduce a lot of time students spend looking for information they need like office timings, address, admission procedure, accommodation information, etc. Voice Assistants are the future of Natural language HCI. Prashn is a web application that allows users to get all the necessary information about Indira Gandhi Delhi Technical University for Women, Delhi either through a chat interface or through speech recognition.

Priya Sharma, Sparsh Sharma, Pooja Gambhir
Spectrogram Analysis and Text Conversion of Sound Signal for Query Generation to Give Input to Audio Input Device

The world is being reshaped by Natural Language Processing. Audio inputs are used in modern electronics. Different types of people supply input to the system in their native language. The system accepts the person's speech, processes it, and responds accordingly. Cooking is a huge problem for a variety of people, including the elderly, those who are confined to their beds, and those who have a specific sort of handicap, such as those who are unable to use their hands and require assistance at all times. To help these people reach their full potential, an audio input device for giving cooking instructions to a cooking system has been proposed in this paper. The gadget takes the user's spoken English language as input, converts it to text using deep learning algorithms, and generates instructions with the help of context-aware words extracted from the recorded audios to send the instruction to the cooking device. To analyses, the audio signal for user authentication is a challenging task due to gaps and pauses between spoken characters, and existing noise in the environment. As a result, the audio input device developed for kitchen systems must analyze the audio input signal to create a more secure environment for authenticated users. As a result, the objective of this paper is to analyze the audio input signal captured in real-time and process the accepted signal to convert into the text to generate instructions for a larger system. The sound signals captured in the real environment are analyzed with Mel spectrogram, MFCC spectrogram, and PRAAT software. The sound signal is processed with the help of a natural language toolkit to generate instructions.

Kavita Sharma, S. R. N. Reddy
A Contrastive View of Vowel Phoneme Assessment of Hindi, Indian English and American English Speakers

Acoustic, Phonetic and accented variations play an important role in pronunciation assessment of different languages uttered by speakers. In this paper, we consider the pronunciation assessment of vowels of North Indian Hindi, Indian English and American English uttered by varying subjects. Indian English is a language spoken in the India as a second language. It is highly influenced by variety of Native Indian languages as well as usage, cultural, regional, social and educational background. It is observed that Indian speakers tends to speak English phonemes close to the articulation of their native language rather than that of the American English. This paper contributes towards the contrastive study of vowel triangles and the appearance of the closeness of vowel cardinal space (ɑ, i, u) of Hindi, Indian English and American English using distance formula.

Pooja Gambhir, Amita Dev, S. S. Agrawal
Text-Based Analysis of COVID-19 Comments Using Natural Language Processing

In dialectology, Natural Language Processing is the process of recognizing the various ontologies of words generated in human language. Various techniques are used for analyzing the corpus from naturally generated content by users on various platforms. The analysis of these textual contents collected during the COVID-19 has become a goldmine for marketing experts as well as for researchers, thus making social media comments available on various platforms like Facebook, Twitter, YouTube, etc., a popular area of applied artificial intelligence. Text-Based Analysis is measured as one of the exasperating responsibilities in Natural Language Processing (NLP). The chief objective of this paper is to work on a corpus that generates relevant information from web-based statements during COVID-19. The findings of the work may give useful insights to researchers working on Text analytics, and authorities concerning to current pandemic. To achieve this, NLP is discussed which extracts relevant information and comparatively computes the morphology on publicly available data thus concluding knowledge behind the corpus.

Kanchan Naithani, Y. P. Raiwani, Rajeshwari Sissodia
Indian Languages Requirements for String Search/comparison on Web

The document formats and protocols that based on character data is mainly prepared for the web. These protocols and formats can be access as resources that contain the various text files that cover syntactic content and natural language content in some structural markup language. In order to process these types of data, it requires various string based operations such as searching, indexing, sorting, regular expressions etc. These documents inspect the text variations of different types and preferences of the user for string processing on the web. For this purpose, W3C has developed two documents Character Model: String Matching and searching that act as building blocks related two these problems on the web and defining rules for string manipulation i.e. string matching and searching on the web. These documents also focus on the different types of text variations in which same orthographic text uses different character sequences and encodings. The rules defined in these documents act as a reference for the authors, developers etc. for consistent string manipulation on the web. The paper covers different types of text variations seen in Indian languages by taking Hindi as initial language and it is important that these types of variations should reflect in these documents for proper and consistent Indian languages string manipulations on the web.

Prashant Verma, Vijay Kumar, Bharat Gupta
Dictionary Vectorized Hashing of Emotional Recognition of Text in Mutual Conversation

Emotion detection is a subset of sentiment classification that interacts with emotion processing and analysis. The condition of just being emotional is frequently associated with making sensible qualitative stimulation of feelings or with environmental influence. With increase in the social media usage, people tend to have frequent conversation through several applications. Even police department tend to analyze the victim of any suicidal cases through the personal chat conversation. Machine learning could be used to analyze emotional detection of the person through text processing of their personal conversation. The text conversation dataset with 7480 conversations from KAGGLE warehouse and is used in the execution analysis to detect the emotional analysis. The text conversation dataset is preprocessed by removing the stop words. The tokens are extracted from the text using NGram method. The emotional labels are assigned for the tokens and the machine is trained to identify the emotions during testing. The emotional labels are converted into features to form corpus text for classifying the emotions in the conversation. The corpus is splitted to form training and testing dataset and is fitted to Dictionary Vectorizer, Feature Hashing, Count Vectorizer and Hash Vectorizer to extract the important features from the text conversation. The extracted features from the text conversation is the subjected to all the classifiers to analyze the performance of the emotion prediction. The scripting is written in Python and implemented with Spyder in Anaconda Navigator IDE, and the experimental results shows that the random forest classifier with dictionary vectorizer is exhibiting 99.8% of accuracy towards predicting the emotions from the personal conversations.

M. Shyamala Devi, D. Manivannan, N. K. Manikandan, Ankita Budhia, Sagar Srivastava, Manshi Rohella
Comparative Analysis of NLP Text Embedding Techniques with Neural Network Layered Architecture on Online Movie Reviews

In NLP world, there is a need to convert the text data into numerical form in a smart way of text embedding with the machine learning architecture. In this research, the comparative text embedding methods of Binary Term Frequency, Count Vector, Term Frequency - Inverse Document Frequency, and Word2Vec is used for converting text to meaningful representations of vectors, containing numerical values. In order to analyze the performance of the various text embedding techniques, Neural Network Layered Architecture is designed for movie review’s polarity classification to include input layer, dense layers followed by the ReLU (Rectified Linear Unit) activation layers and Sigmoid activation function to make classifications on the basis of training–testing performance. Word2Vec text embedding scored the highest training and testing accuracy among all the text embedding techniques of Binary Term Frequency, Count Vector, Term Frequency - Inverse Document Frequency, and Word2Vec with 89.75% and 86.94% respectively with $$\pm { }1.0\,\,{\varvec{error}}$$ ± 1.0 error for the online movie reviews.

Hemlata Goyal, Amar Sharma, Ranu Sewada, Devansh Arora, Sunita Singhal
Current State of Speech Emotion Dataset-National and International Level

Research on emotion extraction from human speech is transitioning from a phase of exploratory study to one with the potential for significant applications, particularly in human–computer interaction. To achieve more accuracy while creating human-computer interaction, the computer system must be provided with good quality data that covers every aspect required for the interaction. The establishment of relevant databases is critical to progress in this area. This research will discuss the scope, naturalness, context, and descriptors of the dataset as the four primary challenges that must be taken care of while constructing databases for emotion embedded speech. Furthermore, the current state of the art is examined to get the status of available datasets for internally spoken languages like English, Dutch, French, and Chinese etc. and for Indian Spoken languages.

Surbhi Khurana, Amita Dev, Poonam Bansal
Context-Aware Emoji Prediction Using Deep Learning

Emojis are a succinct and visual way to express feelings, emotions, and thoughts during text conversations. Owing to the increase in the use of social media, the usage of emojis has increased drastically. There are various techniques for automating emoji prediction, which use contextual information, temporal information, and user-based features. However, the problem of personalised and dynamic recommendations of emojis persists. This paper proposes personalised emoji recommendations using the time and location parameters. It presents a new annotated conversational dataset and investigates the impact of time and location for emoji prediction. The methodology comprises a hybrid model that uses neural networks and score-based metrics: semantic and cosine similarity. Our approach differs from existing studies and improves the accuracy of emoji prediction up to 73.32% using BERT.

Anushka Gupta, Bhumika Bhatia, Diksha Chugh, Gadde Satya Sai Naga Himabindu, Divyashikha Sethia, Ekansh Agarwal, Depanshu Sani, Saurabh Garg
Development of ManiTo: A Manipuri Tonal Contrast Dataset

Tone recognition plays a vital role in understanding speech for tonal languages. Integrating tonal information from a robust tone recognition system can improve the performance of Automatic Speech Recognition (ASR) for such languages. The tonal recognition approaches adopted so far have focused on Asian, African and Indo-European languages. In India, there are very few works on tonal languages, especially those spoken in its North-Eastern part, from which the Manipuri language is largely unexplored. This paper presents the development of a Tonal Contrast dataset for Manipuri, a low resource language. It also presents an initial analysis of the recorded data.

Thiyam Susma Devi, Pradip K. Das
Deep Neural Networks for Spoken Language Identification in Short Utterances

This work presents the elements of language identification (LID) in small segments created using short duration utterances. For low-resourced languages availability of data itself is a challenge. The paper tries to apply DNN for low resourced language. This paper presents a feed-forward deep neural network (FF-DNN) for language identification using acoustic features of short-time utterances. Two network topologies for DNN have been checked for their performance in LID task. The obtained findings of the experiments are compared to a well-established technique based on i-vector system. This i-vector system uses MFCC-SDC to represent speech feature that represent the acoustic characteristics and the back end is implemented using support vector machine (SVM) that serves as a classifier. These mechanisms were put in place to help with identification of Hindi and Punjabi, two widely spoken Indian languages. The speech utterances are divided into short segments of 5 s, 10 s, 20 s and 35-s duration. The system’s efficiency is measured in EER (%) and for short time segments, a relative improvement of 3% is achieved by the DNN system, whereas the average error rate overall the utterances was decreased by 2% using DNN.

Shweta Sinha, S. S. Agrawal

AI Techniques

Frontmatter
A Lightweight Deep Learning Approach for Diabetic Retinopathy Classification

In the present time, chances of suffering from diabetes have drastically increased due to the genetic probability, lack of physical activities, high blood pressure and modern lifestyle related problems. Diabetic Retinopathy (DR) is an intense problem which affects blood vessels in the eye retina. Early detection of DR can avoid severe eye damage. Several machine-learning and deep-learning based techniques have been used for DR detection and classification. However, these techniques are complex, time consuming, and take millions of parameters in training and deploying the DR classifier. In this paper, a lightweight dual-branch based CNN architecture is proposed for DR classification. The proposed architecture involves 84,645 (0.084 M) parameters for training and deploying the model. APTOS dataset has been used for analysis.

Ruchika Bala, Arun Sharma, Nidhi Goel
Machine Hearing a Cognitive Service for Aiding Clinical Diagnosis

Auscultation is being used for screening and monitoring respiratory diseases and is performed using a stethoscope. Auscultation the detection of abnormal respiratory sounds requires skilled medical professionals or clinical experts for diagnosis and an early diagnosis is always recommended for getting a higher probability of both curing and recovery. Respiratory disease being the most common with high morbidity the biggest challenge faced is the scarcity of clinical experts, non-availability of experts in rural and geographically challenged regions. Auscultation is an essential part of the physical examination, real-time and very informative, but based on the auditory perception of lung sounds. This requires the clinician considerable expertise and the perception variability may lead to misidentification of respiratory sounds. In this work, we are proposing an objective evaluation approach using deep learning techniques to address the limitations of the existing approach, a machine hearing technique to aid clinical decisions. In this work, breath sounds are used for analysis. The wheeze and crackles are the indicators of underlying ailments like namely Pneumonia, Bronchiectasis, Chronic Obstructive Pulmonary Disease (COPD), Upper Respiratory Tract Infection (URTI), Lower Respiratory Tract Infection (LRTI), Bronchiolitis, Asthma, and healthy. These sounds were analyzed for classifying the 8 categories of pulmonary diseases. CNN and RNN architectures were used for the classification of respiratory diseases. Features like Mel Frequency Cepstral Coefficients are extracted from the breath sounds. These coefficients were used as the feature for training the CNN and RNN architecture. Data Augmentation techniques like time stretching and shifting were applied to handle the imbalance in the data set. The CNN architecture gave a better accuracy of 0.89 and RNN with a slightly low of 0.833. The proposed approach proves to be a successful solution for the classification. The accuracy can be further improved with real-time data. In the future, this can be extended to develop a machine hearing as a decision support system for the clinical experts.

Arun Gopi, T. Sajini
Critical Insights on Cancer Detection Using Deep Learning

Cancer is grouped into many diseases that can originate in any organ of the human body. It is a type of tumor (cancerous), defined as the uncontrolled growth of damaged or abnormal cells. After originating, they spread to other parts of the body, causing tumors to grow there, too, called metastasis. Cancerous tumors are also known as malignant tumors. To prepare an effective treatment for cancer patients, it must be detected in the early phase. Machine learning and deep learning algorithms may assist in automating this task. One of the most commonly used deep learning techniques is Convolutional Neural Network (CNN). The present study reviews cancer detection using deep learning approaches by elaborating the datasets, results, limitations, and approaches.

Harsimar Kandhari, Sagar Deep, Garima Jaiswal, Arun Sharma
OHF: An Ontology Based Framework for Healthcare

Timely and holistic recommendations in disease diagnosis process may prove a great assistance to a medical practitioner. However, the amount of structured or unstructured data generated in medicinal domain are voluminous and thus imposes challenges. In order to make machines understand the data semantically and infer useful insights from patient’s information, the data needs to be semantically represented in an ontology. Therefore, in this paper we explore various ontologies existing in the medicinal domain. The paper proposes an Ontology-Based Framework for Healthcare (OHF) using these existing ontologies, and also proposes Healthcare Ontology (HO) which is a semantic representation of knowledgebase of patients’ healthcare information available in the form of Electronic Health Record (EHR). The OHF consisting of systematically generated and exhaustive ontologies may be utilized for predicting semantic inferences related to a patient’s medical condition. A case study is being used to explain working of the framework in disease diagnosis.

Shivani Dhiman, Anjali Thukral, Punam Bedi
Enhancing the Deep Learning-Based Breast Tumor Classification Using Multiple Imaging Modalities: A Conceptual Model

Breast tumors’ preliminary and unambiguous prognosis is critical for early detection and diagnosis. A specific study has established automated techniques that use only science imaging modalities to speculate on breast tumor development. Several types of research, however, have suggested rephrasing the current literature's breast tumor classifications. This study reviewed various imaging modalities for breast tumors and discussed breast tumor segmentation and classification using preprocessing, machine learning, and deep neural network techniques. This research aims to classify malignant and benign breast tumors using appropriate medical image modalities and advanced neural network techniques. It is critical to improving strategic decision analysis on various fronts, including imaging modalities, datasets, preprocessing techniques, deep neural network techniques, and performance metrics for classification. They used preprocessing techniques such as augmentation, scaling, and image normalization in the respective investigation to minimize the irregularities associated with medical imaging modalities. In addition, we discussed various architectures for deep neural networks. A convolutional neural network is frequently used to classify breast tumors based on medical images to create an efficient classification paradigm. It could be an existing network or one that has been developed from scratch. The accuracy, area-under-the-curves, precision, and F-measures metrics of the developed classification paradigm will be used to evaluate its performance.

Namrata Singh, Meenakshi Srivastava, Geetika Srivastava
EEG Based Stress Classification in Response to Stress Stimulus

Stress, either physical or mental, is experienced by almost every person at some point in his lifetime. Stress is one of the leading causes of various diseases and burdens society globally. Stress badly affects an individual's well-being. Thus, stress-related study is an emerging field, and in the past decade, a lot of attention has been given to the detection and classification of stress. The estimation of stress in the individual helps in stress management before it invades the human mind and body. In this paper, we proposed a system for the detection and classification of stress. We compared the various machine learning algorithms for stress classification using EEG signal recordings. Interaxon Muse device having four dry electrodes has been used for data collection. We have collected the EEG data from 20 subjects. The stress was induced in these volunteers by showing stressful videos to them, and the EEG signal was then acquired. The frequency-domain features such as absolute band powers were extracted from EEG signals. The data were then classified into stress and non-stressed using different machine learning methods - Random Forest, Support Vector Machine, Logistic Regression, Naive Bayes, K-Nearest Neighbors, and Gradient Boosting. We performed 10-fold cross-validation, and the average classification accuracy of 95.65% was obtained using the gradient boosting method.

Nishtha Phutela, Devanjali Relan, Goldie Gabrani, Ponnurangam Kumaraguru
Latest Trends in Gait Analysis Using Deep Learning Techniques: A Systematic Review

Marker-less analysis of human gait has made considerable progress in recent years. However, developing a gait analysis system capable of extracting reliable and precise kinematic data in a standard and unobtrusive manner remains an open challenge. This narrative review considers the transformation of methods for extracting gait extremity information from videos or images, perceived how analysis methods have improved from arduous manual procedures to semi-objective and objective marker-based systems and then marker-less systems. The gait analysis systems widely used restrict the analysis process with the use of markers, inhibited environmental conditions, and long processing duration. Such limitations can impede the use of a gait analysis system in multiple applications. Advancement in marker-less pose estimation and Q-learning-based techniques are opening the possibility of adopting productive methods for estimating precise poses of humans and information of movement from video frames. Vision-Based gait analysis techniques are capable of providing a cost-effective, unobtrusive solution for estimation of stick images and thus the analysis of the gait. This work provides a comprehensive review of marker-less computer vision and deep neural network-based gait analysis, parameters, design specifications, and the latest trends. This survey provides a birds-eye view of the domain. This review aims to introduce the latest trends in gait analysis using computer vision methods thus provide a single platform to learn various marker-less methods for the analysis of the gait that is likely to have a future impact in bio-mechanics while considering the challenges with accuracy and robustness that are yet to be addressed.

Dimple Sethi, Chandra Prakash, Sourabh Bharti
Detection of Skin Lesion Disease Using Deep Learning Algorithm

Skin lesions are a part of the skin that has an unusual development or appearance contrasted with the skin around it. They may be something you are born with or something you acquire over your lifetime. They can be classified into two types: benign (non-cancerous) or malignant (cancerous). Some studies have been conducted on the computerised detection of malignancy in images. However, due to various problematic aspects such as reflections of light from the skin’s surface, the difference of colour lighting, and varying forms and sizes of the lesions, analysing these images is extremely difficult. As a result, evidence-based automatic skin cancer detection can help pathologists improve their accuracy and competency in the primitive stages of ailment. Our proposed method is to detect the early onset of skin lesions using python as a tool to detect benign (non-cancerous) or severe (cancerous) lesions using a machine learning approach. The dataset consists of nine different classes of skin lesion diseases: Melanoma (MEL), Melanocytic nevus (NV), Basal cell carcinoma (BCC), Actinic keratosis (AK), Benign keratosis (BKL), Dermatofibroma (DF), Vascular lesion (VASC), Squamous cell carcinoma (SCC), None of the above (UNK). In our proposed work, a DCNN model is created for classifying cancerous and non-cancerous skin lesions. We use techniques such as filtering, feature extraction for better categorization which will enhance the final analysis value. From our proposed model we have achieved a training accuracy of 90.7%.

Sumit Bhardwaj, Ayush Somani, Khushi Gupta
Applying XGBoost Machine Learning Model to Succor Astronomers Detect Exoplanets in Distant Galaxies

The time when TRAPPIST-1 news became official on 22.02.2017, detection of planets beyond Milky Way Galaxy or planets orbiting around their own sun-like stars became one of the burning topics unlike prior times. There are seven famous exoplanets in TRAPPIST-1 system which are just forty light-years distant, and are available to be explored by our planets and other spacial telescopes. But several thousand other exoplanets are known to astronomers whose habitability is misleading as there is no evidence about contrasting effects take place between these bright stars and their suspected exoplanets. Since majority of the exoplanets are found using transit principle method, so in this research paper a new tool using XG Boost supervised Machine Learning Model is proposed to detect their presence. The results show that the prediction accuracy, precision and F1-score of this model is very high as compared to the other methods used in literature till now. This work is novel as till now no research work implements XGBoost based model of Machine Learning with highly accurate predictive power. None of the previous work has taken care of all the steps of data pre-processing and handling imbalanced data.

Nidhi Agarwal, Amita Jain, Ayush Gupta, Devendra Kumar Tayal
PSRE Self-assessment Approach for Predicting the Educators’ Performance Using Classification Techniques

With the growing interest in and significance of Educational Data Mining to educators’ performance, there is a vital need to comprehend the full scope of job performance that can substantially impact teaching quality. However, a few educational institutions are attempting to improve educator effectiveness to improve student outcomes. Furthermore, for reasons of confidentiality, most institutions do not share their data. As a result, an assessment of a self-assessment strategy is required to improve educators’ performance. With four input parameters and five classifiers (Logistics Regression, Naive Bayes, K-nearest Neighbor, Support Vector Machine- Linear, and Radial Basis Function), the proposed PSRE (Professional, Social, Research, and Emotional behavior) self-assessment approach is modeled to predict the overall performance of educators working in various Higher Educational Institutions. Overall, K-nearest neighbor has a high accuracy of 95.43%, which may help determine educators’ progress and assist them in reaching new professional heights.

Sapna Arora, Manisha Agarwal, Shweta Mongia, Ruchi Kawatra
Comparative Analysis of Traditional and Optimization Algorithms for Feature Selection

Machine learning enables the automation of the system to generate results without direct assistance from the environment once the machine is trained for all possible scenarios. This is achieved by a series of processes such as collecting relevant data in raw format, exploratory data analysis, selection and implementation of required models, evaluation of those models, and so forth. The initial stage of the entire pipeline involves the necessary task of feature selection. The feature selection process includes extracting more informative features from the pool of input attributes to enhance the predictions made by machine learning models. The proposed approach focuses on the traditional feature selection algorithms and bio-inspired modified Ant Colony Optimization (ACO) algorithm to remove redundant and irrelevant features. In addition, the proposed methodology provides a comparative analysis of their performances. The results show that the modified ACO computed fewer error percentages in the Linear Regression Model of the dataset. In contrast, the traditional methods used outperformed the modified ACO in the SVR model.

Sakshi Singhal, Richa Sharma, Nishita Malhotra, Nisha Rathee
Session Based Recommendations using CNN-LSTM with Fuzzy Time Series

Session based Recommender systems consider change in preferences by focusing on user’s short term interests that may change over a period of time. This paper proposes FS-CNN-LSTM-SR, a hybrid technique that uses CNN (Convolutional Neural Networks) and LSTM (Long Short Term Memory) deep learning techniques with fuzzy time series to recommend products to user based on his activities performed in a session. The advantage of our proposed method is that it combines the benefits of both CNN and LSTM. CNNs are capable of extracting complex local features and LSTM learn long term dependencies from sequential session data. The performance of FS-CNN-LSTM-SR is evaluated on YOOCHOOSE dataset from RecSys Challenge 2015 and is compared with three variations viz. LSTM-SR, CNN-LSTM-SR and FS-LSTM-SR. We observed that our proposed approach performed better than other three variations. The proposed technique is applicable on any E-commerce dataset where user purchasing choices need to be predicted.

Punam Bedi, Purnima Khurana, Ravish Sharma
Feature Extraction and Classification for Emotion Recognition Using Discrete Cosine Transform

In recent years, the rigorous development in tools and techniques for biomedical signal acquisition and processing has drawn interest of researchers towards EEG signal processing. Human emotion recognition using Electroencephalography (EEG) signals has proved to be a viable alternative as it cannot be easily imitated like the facial expressions or speech signals. In this research, authors have explored EEG signals for behavior analysis using Discrete Cosine Transform and classifying the signals using K-Nearest Neighbors. The algorithm is then evaluated on publically available DEAP Dataset. Experimental results are expressed in terms of F1 score, accuracy, precision and recall. The performance metrics evaluation for the classification of the emotional labels of DEAP dataset has further confirmed the effectiveness of the research. Comparison evaluation with the recent state-of-the-art methods further confirms the efficacy of the proposed work.

Garima, Nidhi Goel, Neeru Rathee
Sarcasm Detection in Social Media Using Hybrid Deep Learning and Machine Learning Approaches

Sarcasm refers to the use of ironic language to convey the message. It is mainly used in social sites like Reddit, Twitter, etc. The identification of sarcasm improves sentiment analysis efficiency, which refers to analyzing people's behavior towards a particular topic or scenario. Our proposed methodology has used a hybrid supervised learning approach to detect the sarcastic patterns for classification. The supervised machine learning approaches include Logistic Regression, Naïve Bayes, Random Forest, and hybrid deep learning models like CNN and RNN. Before implementing the models, the dataset has been preprocessed. The data in the dataset is usually not fit for extracting features as it contains usernames, empty spaces, special characters, stop words, emoticons, abbreviations, hashtags, time stamps, URLs. Hence null values, stop words, punctuation marks, etc., are removed, and lemmatization is also done. After preprocessing, the proposed methodology has been implemented using various supervised machine learning models, hybrid neural network models, ensemble hybrid models, and models implementation by using word embeddings. The models have been implemented on two datasets. The outcome revealed that the hybrid neural network model RNN worked the best for both datasets and got the highest accuracy compared to other models.

Tanya Sharma, Neeraj Rani, Aakriti Mittal, Nisha Rathee
An Effective Machine Learning Approach for Clustering Categorical Data with High Dimensions

Many modern real world databases include redundant quantities of categorical data that contribute in data processing and efficient decision-making with their advances in database technology. However, for the reasons that they are identical to measurements the clustering algorithms are only devised for numerical results. An immense amount of work is being performed on the clustering of categorical data using a specifically defined similarity measure over categorical data. Thereby, the dynamic issue with real-world domain, which does not clearly take the predictive form, is the inner function. The function is based on both unseen and transonic perspective. This then offers a detailed and inventive collaboration with categorical results. The paper describes a stratified, immune-based approach with a new similarity metric, in order to reduce distance function, for clustering CAIS categorical data. For successful exploration of clusters over categorical results, CAIS adopts an immunology focused approach. It also selects subsistent nomadic characteristics as a representative entity and organize them into clusters that quantify affinity. To minimize database throughput, CAIS is segmented into several attributes. The analytical findings show that the proposed solution yields greater mining performance on different categorical datasets and outperforms EM on categorical datasets.

Syed Umar, Tadele Debisa Deressa, Tariku Birhanu Yadesa, Gemechu Boche Beshan, Endal Kachew Mosisa, Nilesh T. Gole
Identification of Disease Resistant Plant Genes Using Artificial Neural Network

Much like animals have their defenses against disease-causing pathogens, plants have their own mechanisms to identify and defend against pathogenic microorganisms. Much of this mechanism depends upon disease-resistant genes, also known as ‘R’ genes. Early identification of these R genes is essential in any crop improvement program, especially in a time when plant diseases are one of the biggest causes of crop failure worldwide. Existing methods operate on domain dependence which have several drawbacks and can cause new or low similarity sequences to go unrecognized. In this paper, a Machine Learning method, employing a domain-independent approach, was developed and evaluated which improves upon or eliminate the drawbacks of existing methods. Data sets were obtained from publicly accessible repositories, and feature extraction generated 10,049 number of features. Batch Normalization was used on the models, and we were able to achieve a 97% accuracy on the test dataset which is greater than anything else in the literature that uses the same approach.

Tanmay Thareja, Kashish Goel, Sunita Singhal
A Comparative Study on a Disease Prediction System Using Machine Learning Algorithms

One of the most essential and fundamental factors that motivates people to seek assistance is their physical well-being. A vast range of ailments affect people nowadays, making them extremely vulnerable. Thus, disease prediction at an early stage has now become increasingly relevant as a result of these developments. Machine Learning is a relatively new technique that can aid in the prediction and diagnosis of diseases. To diagnose respiratory difficulties, heart attacks, and liver disorders, this study employs machine learning in conjunction with symptoms. All these diseases are the focus of our investigation because they are extremely prevalent, incredibly expensive to treat, and impact a significant number of people at the same time. A variety of supervised machine learning methods, including Naive Bayes, Decision Trees, Logistic Regression, and Random Forests, are used to forecast the disease based on the provided dataset. The discussion of learning categorization based on correctness ends with this conclusion. Flask is also used to construct a platform that allows visitors to forecast whether they will contract a specific illness and take appropriate precautions if they do contract the illness. Among the most important aspects of healthcare informatics is the prediction of chronic diseases. It is critical to diagnose the condition at the earliest possible opportunity. Using feature extraction and classification methods for classification and prognosis of chronic diseases, this study gives a summary of the current state of the art. The selection of elements that are appropriate for a classification system is critical in improving its accuracy. The decrease of dimensionality aids in the improvement of the overall performance of the machine learning system. The use of classification algorithms on disease datasets gives promising results in the development of adaptive, automated, and smart medical diagnostics for chronic diseases, according to the researchers. Using parallel classification systems, it is possible to speed up the process while also increasing the computing efficiency of the final findings. This paper provides a complete analysis of several feature selection strategies, as well as the advantages and disadvantages of each method.

S. Rama Sree, A. Vanathi, Ravi Kishore Veluri, S. N. S. V. S. C. Ramesh
Leaf Disease Identification Using DenseNet

To maintain a promising status of global food security, it is imperative to strike a congruous balance between the estimated alarming growth in the global population and the expected agricultural yield to cater to their needs appropriately. An agreeable balance has not been acquired in this respect which could be the cause of the origin of food crisis across the world. Therefore it is crucial to prevent any direct or indirect factors causing this. Proper growth of plants and protection against diseases is a very instrumental factor towards meeting the quality and quantity of food requirements globally. Deep learning Methods have gained successful results in the spheres of image processing and pattern recognition. We have made an effort in implementing the methods of deep learning for analyzing leaves of plants for prediction and detection of any diseases. Here, we have considered two majorly grown crops in Himachal Pradesh i.e. tomato and potato, for performing various experiments. In our result analysis, we have achieved an accuracy of 96.24% while identifying the diseases in the leaves.

Ruchi Verma, Varun Singh
A Pilot Study on FoG Prediction Using Machine Learning for Rehabilitation

Walking has a significant impact on one’s quality of life. Freezing of Gait (FoG) is a typical symptom of Parkinson’s disease (PD). FoG is characterised by quick and abrupt transient falls, as a result of which the patient’s mobility is limited and their independence is lost. Thus, early detection of FoG in PD patients is necessary for diagnosis and rehabilitation. The present strategies for early detection of FoG are ineffective and have a low success rate. This study illustrates the comparative analysis of ML techniques (K Nearest Neighbors (KNN), Decision Trees, Random Forest, Support Vector Classifier (SVC), and Ada Boost Classifier), using time and statistical features to perform detection and prediction tasks on the publicly available DaphNet database. FoG prediction is highly patient dependent and achieved a peak F1 - score of 80% for one of the patients. The paper also present a combined analysis of all the patients which may aid in designing wearable sensors for detection. This system detects FoG with a precision value of about 81%.

Kartik Kharbanda, Chandra Prakash
Comparing the Accuracy and the Efficiency in Detection of Coronavirus in CT Scans and X Ray Images

The Coronavirus pandemic, also known as the Covid pandemic, is a global disease (Coronavirus) pandemic caused by SARS Covid 2019 that causes severe respiratory illness (SARS-CoV-2). Side effects differ incredibly in seriousness, going from subtle to perilous. Individuals who are old or have basic clinical issues are more inclined to foster serious infection. Coronavirus is spread by means of the air when beads and small airborne particles dirty it. In this project we would be analyzing the data set images of Chest CT Scans and Chest X Rays for the Detection of Corona Virus using the different kind of deep learning algorithms and checking the efficiency of both of them as to which is more accurate and beneficial for detection of the corona virus pandemics so that this study can be used for future detection of COVID in the patients.

C. V. Sagar, Sumit Bhardwaj
An Analysis of Image Compression Using Neural Network

Image compression belongs to the area of data compression because the image is itself made up of data and the task of compressing images has become vital in our current life. Because the scenario is that images are required to build more attractive contents, also in today’s world smartphones cover large fraction of internet traffic and are having low data bandwidth on average. Due to these factors and restrictions on bandwidth and other computing capabilities it has necessary for developers of websites/applications to reduce either size or resolution or both of an image to improve responsiveness of your websites/apps. For this purpose image compression is divided into two categories these are lossless image compression and lossy image compression. The requirement for lossless image compression is that during the decompression process the image data must be recovered without/with negligible loss in image quality while in lossy image compression certain amount or level of error is allowed in image data to achieve better compression ratios and performance. Neural Networks because of their good performance have been used to implement the task of image compression and there are multiple modified neural networks that are proposed to perform image compression tasks, however the consequent models are big in size, require high computational power and also best suited for fixed size compression rate and some of them are covered in this survey report.

Mohit, Pooja Dehraj
Deep Learning on Small Tabular Dataset: Using Transfer Learning and Image Classification

Deep Learning is a subset of machine learning inspired by the human brain. It uses multiple layers of representation to extract specific knowledge from raw input. It is best suited for large image or sound-based datasets. Deep learning methods are generally avoided for small datasets because they tend to overfit. Transfer learning can be one approach used to solve this problem. However, in the case of tabular datasets, their heterogeneous nature makes transfer learning algorithms inapplicable. This paper aims to discuss a few approaches using a literature review to convert tabular data into images to overcome such limitations. The paper provides a 2-part study wherein we first give a brief overview of transfer learning enhancing the efficiency of deep learning algorithms and drastically reducing the training time for small datasets. Secondly, we provide a detailed study of different techniques available to convert tabular data into images for image classification such as SuperTML, IGTD, and REFINED approach. Furthermore, we propose a novel approach inspired by IGTD to create a blocked image representation of the tabular data on which we apply transfer learning to demonstrate the application of deep learning methods on small tabular datasets (with less than 1000 data points).

Vanshika Jain, Meghansh Goel, Kshitiz Shah
Implementation of a Method Using Image Sequentialization, Patch Embedding and ViT Encoder to Detect the Breast Cancer on RGBA Images and Binary Masks

This paper uses an approach where we will be training the unlabelled and labelled datasets into the system and these datasets comprises of: RGBA images and Binary masks. This will be a benefit in order to get a much better result and more accurate also as pre-training method always helps in getting excellent output. We will use a Transformer Model in this paper where we will apply an image sequentialization technique having 51 million parameters which will help us in attaining the smoother images without any noise. At times, this approach could be time consuming due to used datasets and its sizes, however it shows more accurate and efficient results. Further, we will try to figure out the pixel wise label map using patch embedding technique. Once these techniques are applied then CNN-Transformer Hybrid will come into the role which will encode and decode the images to high-level feature extractions and full spatial resolution respectively. This way of doing encoding and decoding is also known as forward pass and back propagation. Also, it will involve the cascaded upsampler where we will try to use self-attention processes into the design of encoder using the transformers. This entire mechanism will involve few of the best evaluation metrics and those are: Pixel accuracy, IoU, Mean-IoU and Recall/Precision/F1 Score giving the effective and best results.

Tanishka Dixit, Namrata Singh, Geetika Srivastava, Meenakshi Srivastava
An Ensemble Model for Face Mask Detection Using Faster RCNN with ResNet50

The computer vision which is an important aspect of Artificial Intelligence. The object detection is the most researchable area with deep learning algorithms. Now in the current COVID – 19 pandemics, the social distancing is a mandatory factor to prevent this transmission of this deadly virus. The government is struggling to handle the persons without wearing masks in public places. Our work concentrates on the object detection of face masks using the state-of-the-art methodologies like YOLO, SSD, RCNN, Fast RCNN and Faster RCNN with different backbone architectures like ResNet, MobileNet, etc. This paper brings out various ensemble methods by combining the state of art methodologies and compare those combinations to identify the best performance, in choice of the dataset of the application. We have obtained the highest performance benchmark with the usage of Faster RCNN – ResNet50 among the other ensemble methods. All the performance evaluation metrics are compared with one other with the same face mask detection image dataset. In this paper, we present a balancing collation of the ensemble methods of object detection algorithms.

M. DhivyaShree, K. R. Sarumathi, R. S. Vishnu Durai
A Novel Approach to Detect Face Mask in Real Time

COVID-19, a deadly virus outbreak in the entire world, infected countless number of people and leads to death of millions of people. Economy of countries halted, people are stuck up, the situation is becoming worse day by day that no one expected. The COVID-19 can be spread through airborne droplets, aerosols and other carriers. So, the guidelines are released which promotes 3 key points for its prevention: (a) maintain social distancing, (b) sanitization, and (c) most important, wearing of mask in public places. But unfortunately, people are avoiding these measures leading to spread of the disease. So, there is need of some sort of security guards on ground that ensures people would strictly follow the guidelines. But, it’s too risky for the lives of security guards. Fortunately, we live in 21st century. There are so powerful technologies like Machine Learning, Artificial Intelligence, Deep Learning and many more. So, it is possible to develop a way where machines will help us to ensure the guidelines to be followed. There is no need for a physical person to watch over crowd. This research paper proposed a work to implement machine-learning algorithm to ensure people wear the proper face mask or not. It can be used in public places like airports, railway stations or at main gate of societies to ensure that no one without mask may enter the society and similarly can be used at stores.

Sumita Gupta, Rana Majumdar, Shivam Deswal
A Novel Approach for Detecting Facial Key Points Using Convolution Neural Networks

The task of face recognition is having many real-time applications in which the process of facial keypoint detection is considered to be an intermediate and crucial step. The amount of keypoints that are using for face recognition decides the computational requirements of the algorithm. In this paper, an effort has been made to detect the useful 15 facial key points using convolutional neural networks and compared with the state-of-the-art system with 30 facial key points. We made an effort to identify the 15 facial key points (6 points from eye +4 points from eyebrows +4 points from lips +1 point from the nose) by using the proper hyperparameters for convolutional neural network. It is found that the performance of the proposed system is quite similar when compared to the system with 30 facial key points.

Rishi Kakkar, Y. V. Srinivasa Murthy
Effective Hyperspectral Image Classification Using Learning Models

Recently, machine learning has produced appreciable performance results on various visual computing related studies, including the classification of common hyperspectral images. This study aims to compare the results of different machine learning models for the classification of a hyperspectral image dataset. The hyperspectral data captured from AVIRIS sensor covering scene over the Indian Pines test site in North-western Indiana and consists 224 spectral reflectance bands. The ground truth has sixteen classes including vegetation crops, built structures, etc. Accuracy assessments and confusion matrices were used to evaluate classification performance. The study includes classification results of mainly three learning models including dimensionally reduced data via PCA for SVM classification, CNN and k-NN. The overall accuracy in PCA-SVM results was 72.38%, CNN was 85% and k-NN was 66.21% concluding the better efficiency of CNN classification for the hyperspectral dataset. Hence CNN classification technique succeeded in the hyperspectral image classification.

Sushmita Gautam, Kailash Chandra Tiwari
Reinforcing Digital Forensics Through Intelligent Behavioural Evidence Analysis: Social Media Hate Speech Profiling

Cumulative boom in the use of social media for crime incidents creates a need to achieve improvement in overall efficiency by developing tools and techniques. The present manuscript discusses this issue by proposing a model which offer a feasible solution to escalate the criminal investigation by performing digital criminal profiling using social media hate speeches content. The proposed expert system will consider the hate speech content analysis along with other digital footprints of the suspects. The achieved analysis along with the proofs collected manually will be fed into the knowledge base of expert system. The system will automatically process the dataset to lower down the list of suspects using intelligent machine learning algorithms. Hate speech content analysis achieved using latest intelligent mechanisms, can perform in the boundaries of Behavioural Evidence Analysis-Standardised (BEA-S) model to create criminal profiling which again act as a base for future investigations. The whole concept and the model are discussed in the paper which surely will upgrade the current investigation process to an innovative stature of digital forensics.

Barkhashree, Parneeta Dhaliwal
Collaborative Travel Recommender System Based on Malayalam Travel Reviews

Recommender System is an unsupervised machine learning technique used for users to make decisions on their own choices and interests. Travel and Tourism is one of the important domains in this area. Rapid growth of redundant, heterogeneous data and web has created the importance for Recommender system. Travel Recommender System helps to users or travelers to decide on their choices and preferences on their travel. The work proposes a machine learning approach for a personalized travel recommender system in Malayalam Language, one of the prominent languages used in Southern part of India. The system developed using Collaborative Filtering technique, Malayalam Text processing with the help of Cosine similarity and TF-IDF methods. Data has been taken from the largest Malayalam Travel group in Facebook named “Sanchari”. A customized scraping algorithm also developed to collect relevant and enough information from social media. The RS could suggest most suitable destinations to each of users with the accuracy of 93%, which is fair result as compared with other algorithms on same domain.

V. K. Muneer, K. P. Mohamed Basheer
Hybrid Framework Model for Group Recommendation

Group-based recommender systems are known for web-based applications to satisfy the preferences of every user in the group equally. The recommender system aims to identify user-preferred items, such as movies, music, books, and restaurants that tend to satisfy each individual and their collective needs in a group. They have a proclivity to solve the problem of information overload by probing through large volumes of dynamically generated information and provide its users with significant content and services. For better and more pertinent recommendations for a group of users, it is a non-trivial task as there is a huge diversity in tastes and preferences among various members of the group residing at different locations. This paper presents the results computed from traditional filtering approaches and the Hybrid filtering mode. These approaches are evaluated using various offline metrics on Movie-Lens dataset and the proposed model is based on the working efficiency and the quality of recommendations generated. This Hybrid Filtering Model proposed as Hyflix is explored for generating recommendations to a group of users and compared with the existing traditional filtering approaches in a social network.

Inderdeep Kaur, Sahil Drubra, Nidhi Goel, Laxmi Ahuja
Backmatter
Metadata
Title
Artificial Intelligence and Speech Technology
Editors
Amita Dev
Prof. Dr. S. S. Agrawal
Arun Sharma
Copyright Year
2022
Electronic ISBN
978-3-030-95711-7
Print ISBN
978-3-030-95710-0
DOI
https://doi.org/10.1007/978-3-030-95711-7

Premium Partner