Intelligent Human Computer Interaction
14th International Conference, IHCI 2022, Tashkent, Uzbekistan, October 20–22, 2022, Revised Selected Papers
- 2023
- Buch
- Herausgegeben von
- Hakimjon Zaynidinov
- Madhusudan Singh
- Uma Shanker Tiwary
- Dhananjay Singh
- Buchreihe
- Lecture Notes in Computer Science
- Verlag
- Springer Nature Switzerland
Über dieses Buch
Dieses Buch ist das Referat der 14. Internationalen Konferenz zur Interaktion intelligenter menschlicher Computer, IHCI 2022, die vom 20. bis 22. Oktober 2022 in Taschkent, Usbekistan, stattfand. Die 47 vollständigen und 13 kurzen Beiträge in diesem Buch wurden sorgfältig überprüft und aus 148 Einreichungen ausgewählt. Sie waren wie folgt in thematische Abschnitte gegliedert: Bio-inspired Computing; Cognitive Computing; Human Centered AI; Intelligent Technology for Post-Covid and Web Frameworks.
Mit KI übersetzt
Über dieses Buch
This book constitutes the refereed proceedings of the 14th International Conference on Intelligent Human Computer Interaction, IHCI 2022, held in Tashkent, Uzbekistan, during October 20–22, 2022.
The 47 full papers and 13 short papers included in this book were carefully reviewed and selected from 148 submissions. They were organized in topical sections as follows: Bio-inspired Computing; Cognitive computing; Human Centered AI; Intelligent Technology for Post-Covid and Web Frameworks.
Anzeige
Inhaltsverzeichnis
-
Frontmatter
-
Deepfake Video Detection Using the Frequency Characteristic of Remote Photoplethysmography
Su Min Jeon, Hyeon Ah Seong, Eui Chul LeeAbstractPhotoplethysmography is a technique for measuring the blood flow per unit time of an artery. Remote photoplethysmography is a method for obtaining photoplethysmography signals in a non-contact manner through a sensor such as a camera and has been recently applied to various fields. In this study, we propose a method for detecting Deepfake modulated color video based on remote photoplethysmography concept. As experimental data, 50 real videos and their 50 Deepfake videos using Face Swapping Generative Adversarial Networks were used. The photoplethysmography signals of face and neck regions were extracted, respectively, and the signals were preprocessed by detrending and performing Butterworth bandpass filtering. The 80 power values in the frequency domain were defined as feature vectors. As a result of analyzing the L2 Norm between the two vectors extracted from the face region and the neck region, the L2 Norms of the real video and the fake video were 0.0000307 and 0.0001332, respectively, confirming that the distributions were clearly separated. It was confirmed that there is a significant difference between the real and the fake videos. Also, as a result of calculating the degree of separation of distributions with d-prime, 2.32 was derived. -
A Multi-layered Deep Learning Approach for Human Stress Detection
Jayesh Soni, Nagarajan Prabakar, Himanshu UpadhyayAbstractIn today's fast-paced world, stress is common on various occasions in everyday life. However, long-term stress hinders normal lives. Detection of such mental stress at an earlier stage can prevent many associated health problems. There are significant changes in the multiple bio-signals, such as electrical, thermal, optical, etc., when an individual is under stress. Such bio-signals can be utilized to identify stress. In this paper, we propose a multi-layered deep learning-based approach for detecting human stress using the multimodal dataset. We use an open-source dataset, namely Wearable Stress and Affect Detection (WESAD), which contains data from wearable physiological and motion sensors. The modalities of these sensors include axis acceleration, body temperature, electrocardiogram, and electrodermal activity with three conditions: baseline, amusement, and stress. In the first layer of our multi-layered approach, we train and compare AutoEncoder and Variational AutoEncoder to learn the normal emotional state of the subject. In the second layer, we train and compare LSTM and Transformer models to classify the subjects as either in an amused or stressed state. This multi-layered approach helps to achieve a higher stress detection rate of 98%. -
Digital Processing Algorithms of Biomedical Signals Using Cubic Base Splines
Mukhriddin Abduganiev, Rakhimjon Azimov, Lazizbek MuydinovAbstractIn this article, the issues of digital processing and restoration of electroencephalogram (EEG) signals from biomedical signals are considered, the location of 21 sensors in the EEG apparatus along the brain, the naming of the sensors, their connection types, the use of bipolar coupling in the detection of disease symptoms, interpolation of received signals, disease symptoms. The processes of separating parts into scales have been studied. During the work, the B-spline function was selected as the most convenient mathematical model for digital processing of EEG signals, and the construction of the B-spline function was presented. Based on the constructed mathematical model, an algorithm for restoring the electroencephalogram signals by dividing the problematic parts into scales was developed, and the absolute error in the restoration of EEG signals was estimated. -
Methods for Creating a Morphological Analyzer
Elov Botir Boltayevich, Hamroyeva Shahlo Mirdjonovna, Axmedova Xolisxon IlxomovnaAbstractThe morphological analysis process is an important component of natural language processing systems such as spelling correction tools, parsers, machine translation systems, and electronic dictionaries. This article describes the stages of a text analyzer, methods for creating a morphological analyzer and a morphological generator. Ways to use the NLTK package tools in Python when creating a morphological analyzer, examples of software codes are given. Also, morphological analyzer structure and architecture are presented on the basis of the morphological analysis process (flexion, derivative, affixpids detection, compound forms). -
Uzbek Speech Synthesis Using Deep Learning Algorithms
M. I. Abdullaeva, D. B. Juraev, M. M. Ochilov, M. F. RakhimovAbstractThis paper presents modern architectures for effective speech synthesis. Since each language has its own subtleties, the task of applying the world methods for the Uzbek language was relevant, due to the lack of research in this direction. The paper presents a method consisting of the acoustic model Tacotron and the neural vocoder parallel waveGAN. The formed speech corpus with the volume of 31 h of Uzbek speech is described. The quality of the synthesized speech was evaluated using the MOS scale, according to that the intelligibility and accuracy of the synthesized speech was 4.36 points out of five. -
Speech Recognition Technologies Based on Artificial Intelligence Algorithms
Muhammadjon Musaev, Ilyos Khujayarov, Mannon OchilovAbstractIn this article, research was conducted on the development of automatic Uzbek speech recognition technology based on integral models. Methods of continuous speech recognition technology in Uzbek were studied at all stages and suitable ones were selected. A 200-hour speech corpus was trained on the DNN-CTC architecture for acoustic modeling. The accuracy of the developed speech recognition system achieved WER = 17.3%, CER = 7.5% on the test data set. -
Multimodal Human Computer Interaction Using Hand Gestures and Speech
Mohammed Ridhun, Rayan Smith Lewis, Shane Christopher Misquith, Sushanth Poojary, Kavitha Mahesh KarimbiAbstractThe paper presents multimodal human-computer interaction using speech and gesture recognition to develop a system for mouse movement and operation. The approach allows users to perform mouse navigation and various mouse operations without the need for physical contact with the system. Splitting up the task of mouse navigation and operations with gesture and speech recognition respectively led to a user-friendly and seamless experience for the user. Since no physical contact is required between the user and the system, it could be used by doctors while performing surgery, mechanics while they are handling their instruments from a distance, and casual users if circumstance arise. Unlike a unimodal gesture recognition system the proposed multimodal system allows mouse pointer control using speech and employs gestures to perform mouse operations. -
Emotion Recognition in VAD Space During Emotional Events Using CNN-GRU Hybrid Model on EEG Signals
Mohammad Asif, Majithia Tejas Vinodbhai, Sudhakar Mishra, Aditya Gupta, Uma Shanker TiwaryAbstractEmotion recognition from brain signals is an emerging area of interest in the scientific community. We used EEG signals to classify emotional events on different combinations of valence(V), arousal(A) and dominance(D) dimensions and compared their results. DENS data is used for this purpose which is primarily recorded on the Indian population. STFT is used for feature extraction and used in the classification model consisting of CNN-GRU hybrid layers. Two classification models were evaluated to classify emotional feelings in valence-arousal-dominance space (eight classes) and valence-arousal space (four classes). The results show that VAD space’s accuracy is 97.50% and VA space is 96.93%. We conclude that having precise information about emotional feelings improves the classification accuracy in comparison to long-duration EEG signals which might be contaminated by mind-wandering. In addition, our results suggest the importance of considering the dominance dimension during the emotion classification. -
Multiclass Classification of Online Reviews Using NLP & Machine Learning for Non-english Language
Priyanka Sharma, Pritee ParwekarAbstractThe classification of reviews or comments provided by the customers after shopping has a wide scope in terms of the categories it can be classified. Big companies like Walmart, Tesco and Amazon have customers from all over the world with a variety of product range and can have reviews written in any language. Sometimes customers intend to provide reviews not only on the same platform but on various other platforms like Facebook, Twitter. To get an overall picture of the products it’s required to check the reviews from all these platforms at a single place. This paper classifies the comments\reviews written in Spanish language and category names are taken in English language for 30 product categories. The purpose is to get the product categorized from comments/reviews on different platforms in non-English language, to gather insights of that product and to reduce the dependency faced during the manual process of classification and barrier to have command on that language. The approach used reduces the chances of manual errors during prediction of new reviews/comments to a particular category. A multiclass Classification model is trained using traditional Machine Learning algorithms & NLP with an accuracy of 90%. It is envisioned that the proposed methodology is scalable for other non-English languages as well. -
A Higher Performing DARTS Model for CIFAR-10
Jie Yong Shin, Dae-Ki KangAbstractMachine Learning experts spend much time on fine-tuning. A methodology that automatically searches neural architectures has been to solve this problem. Differentiable Architecture Search (DARTS) is an algorithm that solves a Neural Architecture Search problem using a gradient-based approach. We found an architecture that shows higher test accuracy than the existing DARTS architecture with the DARTS algorithm on the CIFAR-10 dataset. The architecture performed the DARTS algorithm several times and recorded the highest test accuracy of 97.62%. This result exceeds the test accuracy of 97.24 ± 0.09 shown in the existing DARTS paper. These results are expected to raise the baseline for making a practical difference in the study of Neural Architecture Search. -
Automatic Speech Recognition on the Neutral Network Based on Attention Mechanism
N. S. Mamatov, N. A. Niyozmatova, Yu. Sh. Yuldoshev, Sh. Sh. Abdullaev, A. N. SamijonovAbstractThis article focuses on the problems that arise in the recognition of speech through machine learning and the methods based on in-depth learning used to overcome them, which outlines approaches to the transition to a coding-decoding architecture system based on the attention mechanism. It also describes the hybrid CTC/Attention architecture, which is now widely used in speech recognition. In recent years, models of neural network architectures and neural network model based on attention mechanism, which are widely used in automatic speech recognition, have been proposed, which are taught on the basis of Uzbek and Russian speech corpuscles and the results obtained are comparatively analyzed. -
Equal Temperament and Just Intonation Feature Based Analysis of Indian Music
D. V. K. Vasudevan, Nagamani Molakatala, Nikil Priyatham, Ravikant Gautam, M. RajenderAbstractIt is a well-known fact that most western music is based on equal temperament, but Indian music is based on just intonation. However, the west has always been in the lead in creating electronic music processing tools like keyboards and software like Ableton Live. These instruments and programmes made by the west and intended mainly for Western music are increasingly used in Indian film music and a lot of fusion music (Equal temperament).A quantitative assessment is required to determine how severe the compromise is when using this software or instruments for Indian music production. We discovered that, despite numerous papers stating the distinction between the two types of scales (just intonation and equal temperament), there isn’t a single thorough study that carries out trials and documents the honour. We are conducting this study for the first time. In this piece, we experiment with numerous Indian instrumental and vocal compositions to compare equal temperament with merely intonated scales. We discovered that the compromise is minimal when the music is mostly instrumental but significant when it is vocal. We also identified intriguing patterns in the musical notes (of Indian music) that were “closer” to the equivalent equal temperament notes than just intonated notes by examining the frequency values. For this study’s sake, we evaluate plain notes exclusively; we do not examine musical notes containing the words “gamakam” or “meend.” Since only plain notes are discussed in this text, Carnatic and Hindustani music can benefit from them. Finally, we make the dataset available to the general public to encourage more research in this area. -
Emotion Classification Through Facial Expressions Using SVM and Convolutional Neural Classifier
Varsha Singh, Ravi Kumar Singh, Uma Shanker TiwaryAbstractEmotions being an influential personal state of feelings, have phenomenal importance, and facial expressions are one’s instinctive reflections and photoprints of emotions. These facial expressions that count to be 55% of the total human communication cannot remain unnoticed, especially in today’s expanding world of Human-Computer interaction, where the need of the hour is to train computers to recognize human emotions from facial expressions of images. Four models are developed in this work for emotion classification. This work utilizes HOG (descriptor) and SVM for the first model while employing CNN models with varying input strategies with and without down-sampling in the remaining three models to predict the given FER dataset images into either of the seven universal facial expressions. The first model extracts the histogram of oriented gradient (HOG) from the images and applies classification with a support vector machine (SVM). The second model inputs raw pixel image data for training. The third model uses a novel hybrid feature strategy that maneuvers a combination of HOG features and pixel data of images. The last model uses the same architecture as the previous two CNN models but with a balanced dataset (all classes having the same number of images). Batch normalization, dropout, and L2 regularization reduced the overfitting of models, and the GPU improved the training speed. The hybrid technique (Model-3) performed better than model-1, model-2, and model-4 in terms of accuracy and F1 score. The performance evaluation speaks about the falter arising with downsampling in model-4. -
On the Evaluation of Generated Stylised Lyrics Using Deep Generative Models: A Preliminary Study
Hye-Jin Hong, So-Hyeon Kim, Jee-Hang LeeAbstractDeep generative models such as a family of GPT have exhibited super-human performance in natural language generation. However, the evaluation of the generated lacks the automated solutions and mostly requires human involved manual experiments. This paper explores the possibility of a computational means to evaluate the generated contents in an automated way. We in particular conducted the experiment with stylised lyrics which requires careful consideration in the evaluation since the lyrics generation takes into account individual characteristics of artists. To this end, we first carried out the lyrics generation through fine-tuning with K-Pop songs in three different genres using the KoGPT-2 to effectively transfer the individual artists’ persona and style. Afterwards we conducted the evaluation of stylised lyrics with another deep generative model, BERT, to measure the similarity between the lyrics generated and that in the training data, both within and between artists. The results showed the highest score between the generated and the original lyrics within the same artist but lower similarity than that between the artists, which the phenomena was not captured in a typical evaluation metric such as BLEU. Although this is a preliminary approach, this shows a possibility to automatically evaluate the generated contents in which individual characteristics were infused without human effort. -
GWD: Graded Word Drop Model for When Type Questions for Hindi QA
Vani, Sumit Singh, Puja Burman, Anmol Jain, Uma Shanker TiwaryAbstractThis paper proposes a preprocessing methodology, namely, Graded Word Drop (GWD) and its algorithm as a solution to the problem of bigger contexts in Extractive Question Answering for Hindi language, focusing mainly on “When” (
) type questions. This paper discusses in detail the problems associated with bigger contexts, such as increased prediction times, misleading text as a part of bigger context etc. It then discusses three methodologies, viz., Boolean Model and two new proposed methodologies of Word Drop. We used cross-linguality of transformer models, mBERT, XLM-RoBERTa and MuRIL and fine-tuned them using SQuAD dataset. We used 84 Hindi-language, “When” (
) type questions from chaii (Challenge in AI for India) dataset for evaluation. The GWD preprocessed text gave improvement over non-preprocessed results in terms of both accuracy and F1-score and achieved 53.57%, 38.09%, 55.95% accuracy, and 63.21, 68.37 and 67.09 F1-score in mBERT, XLM-RoBERTa and MuRIL respectively and improved prediction times by five fold in all these models. -
Masked Face Recognition Model with Explainable AI
Hyeon Ah Sung, Seunghyun Kim, Eui Chul LeeAbstractDue to the recent COVID-19 pandemic, people tend to wear masks indoors and outdoors. Therefore, systems with face recognition, such as FaceID, showed a tendency of decline in accuracy. Consequently, many studies and research were held to improve the accuracy of the recognition system between masked faces. Most of them targeted to enhance dataset and restrained the models to get reasonable accuracies. However, not much research was held to explain the reasons for the enhancement of the accuracy. Therefore, we focused on finding an explainable reason for the improvement of the model’s accuracy. First, we could see that the accuracy has actually increased after training with a masked dataset by 12.86%. Then we applied Explainable AI (XAI) to see whether the model has really focused on the regions of interest. Our approach showed through the generated heatmaps that difference in the data of the training models make difference in range of focus. -
UX Design Workshop for Building Relationships Between Humans and Intelligent Objects Using ‘T + e = B’ Toolkit
Eui-chul Jung, Younhee Cho, Hyewon KimAbstract‘T + e = B’ (Trigger plus emotion equals behavior toolkit) was developed for designers to create and evaluate design concepts using nine triggers as inspirational tools. We claim with T + e = B that emotion is the key to finding a missing link between a single decision-making process and the lasting behavior for trigger design. To prove it, we will present participants with the Frame of AB, OT, and DB, Trigger Cards, and Hint Cards to help them derive a behavioral strategy for their target audience. This workshop aims to think about the symbiotic relationship between intelligent objects and humans from a UX design perspective, and to consider the direction of HCI design. Therefore, in this workshop, contemplate and discuss the direction of artificial intelligence (AI) chatbots or robots that can emotionally communicate with humans based on human emotions with participants from various cultural and social backgrounds. At the workshop, participants will (1) use metaphors to select desired relationships between humans and intelligent objects to define existing behavioral patterns and obstacles, (2) ideate triggers using Hint and Trigger Cards. To this end, especially explore various ways of acquiring trust and intimate relationships between intelligent objects and humans, or creating positive user experiences.
- Titel
- Intelligent Human Computer Interaction
- Herausgegeben von
-
Hakimjon Zaynidinov
Madhusudan Singh
Uma Shanker Tiwary
Dhananjay Singh
- Copyright-Jahr
- 2023
- Verlag
- Springer Nature Switzerland
- Electronic ISBN
- 978-3-031-27199-1
- Print ISBN
- 978-3-031-27198-4
- DOI
- https://doi.org/10.1007/978-3-031-27199-1
Informationen zur Barrierefreiheit für dieses Buch folgen in Kürze. Wir arbeiten daran, sie so schnell wie möglich verfügbar zu machen. Vielen Dank für Ihre Geduld.