Skip to main content

International Journal of Speech Technology OnlineFirst articles

23.04.2024

Improving low-complexity and real-time DeepFilterNet2 for personalized speech enhancement

DeepFilterNet2, a recently proposed real-time and low-complexity speech enhancement (SE) technique, has shown state-of-the-art SE performance in many deep noise suppression tasks. This paper proposes a new approach, termed pDeepFilterNet2, to …

verfasst von:
Shilin Wang, Haixin Guan, Shuang Wei, Yanhua Long

15.04.2024

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

The field of speech recognition makes it simpler for humans and machines to engage with speech. Number-oriented communication, such as using a registration code, mobile number, score, or account number, can benefit from speech recognition for …

verfasst von:
Hossam Boulal, Mohamed Hamidi, Mustapha Abarkan, Jamal Barkani

10.04.2024

Survey on Arabic speech emotion recognition

Emotions represent a fundamental aspect when evaluating user satisfaction or collecting customer feedback in human interactions, as well as in the realm of human–computer interface (HCI) technologies. Moreover, as human beings, we possess a …

verfasst von:
Latifa Iben Nasr, Abir Masmoudi, Lamia Hadrich Belguith

07.04.2024

Automatic Speech Emotion Recognition: a Systematic Literature Review

Automatic Speech Emotion Recognition (ASER) has recently garnered attention across various fields including artificial intelligence, pattern recognition, and human–computer interaction. However, ASER encounters numerous challenges such as a …

verfasst von:
Haidy H. Mustafa, Nagy R. Darwish, Hesham A. Hefny

03.04.2024

Hyperkinetic Dysarthria voice abnormalities: a neural network solution for text translation

The implementation of a defect speech recognition (DSR) system has the opportunity to significantly improve the lifestyle of people with speech disorders. In this paper, we developed a novel ConvGRUSpeechNet model for recognizing and understanding …

verfasst von:
Antor Mahamudul Hashan, Chaganov Roman Dmitrievich, Melnikov Alexander Valerievich, Dorokh Danila Vasilyevich, Khlebnikov Nikolai Alexandrovich, Boris Andreevich Bredikhin

30.03.2024

A computationally efficient speech emotion recognition system employing machine learning classifiers and ensemble learning

Speech Emotion Recognition (SER) is the process of recognizing and classifying emotions expressed through speech. SER greatly facilitates personalized and empathetic interactions, enhances user experiences, enables sentiment analysis, and finds …

verfasst von:
N. Aishwarya, Kanwaljeet Kaur, Karthik Seemakurthy

29.03.2024

Speech recognition based on the transformer's multi-head attention in Arabic

The Transformer model is frequently employed for speech command recognition (SCR) since it supports parallelization and has internal attention. The high learning speed of this design and the absence of sequential operation, like with recurrent …

verfasst von:
Omayma Mahmoudi, Mouncef Filali-Bouami, Mohamed Benchat

29.03.2024

Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection

With the increasing adoption of voice-based authentication systems, the threat of audio spoofing attacks has become a significant concern. These attacks aim to deceive voice authentication systems by manipulating or impersonating audio signals. To …

verfasst von:
Nidhi Chakravarty, Mohit Dua

26.03.2024 | Retraction Note

Retraction Note: Computer vision for facial analysis using human–computer interaction models

verfasst von:
Zitian Liao, R. Dinesh Jackson Samuel, Sujatha Krishnamoorthy

26.03.2024

Conditional Denoising Diffusion Implicit Model for Speech Enhancement

Recently, denoising diffusion probabilistic models (DDPMs) have been effective in speech enhancement. However, existing models largely follow the original diffusion training method, ignoring the trade-off between optimization goals in different …

verfasst von:
Chengyong Yang, Xiukang Yu, Sheng Huang

18.03.2024

Continuous feature learning representation to XGBoost classifier on the aggregation of discriminative Features using DenseNet-121 architecture and ResNet 18 architectures towards Apraxia Recognition in the Child Speech Therapy

Due to the peculiar cell growth, apraxia is one of the common types of speech stuttering seen in youngsters. Machine learning-based architecture has been used to automatically categorize the disfluency based on the feature and its properties …

verfasst von:
P. Ashwini, S.H. Bharathi

14.03.2024

Speech recognition model design for Sundanese language using WAV2VEC 2.0

Indonesia has a variety of languages, one of which is Sundanese. Sundanese is a regional language from Indonesia that has the potential to become extinct. One way to prevent Sundanese from potential extinction is with speech recognition. Speech …

verfasst von:
Albert Cryssiover, Amalia Zahra

14.03.2024

Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs)

Machine learning provides researchers in speech processing and bioacoustics numerous advanced and non-invasive techniques to investigate animal vocalizations. Hidden Markov Models (HMMs) are machine learning techniques that were developed and …

verfasst von:
Marek B. Trawicki

13.03.2024

Modern Standard Arabic speech disorders corpus for digital speech processing applications

Digital speech processing applications including automatic speech recognition (ASR), speaker recognition, speech translation, and others, essentially require large volumes of speech data for training and testing purposes. Although there are …

verfasst von:
Assal A. M. Alqudah, Mohammad A. M. Alshraideh, Mohammad A. M. Abushariah, Ahmad A. S. Sharieh

12.03.2024

A review on Gujarati language based automatic speech recognition (ASR) systems

Automatic speech recognition (ASR) plays a crucial role in facilitating natural and efficient human–computer interaction. This paper offers a comprehensive review of ASR systems tailored specifically for the Gujarati language. Existing literature …

verfasst von:
Mohit Dua, Bhavesh Bhagat, Shelza Dua, Nidhi Chakravarty

04.03.2024

Multi-coder vector quantizer for transparent coding of wideband speech ISF parameters

Modern low bit-rate speech coders require efficient coding of the linear predictive coding (LPC) coefficients. Immittance Spectral Frequencies (ISF) and Line Spectral Frequencies (LSF) are currently the most efficient transmission parameters for …

verfasst von:
Merouane Bouzid, Nacèra Meziane, Salah-Eddine Cheraitia

29.02.2024

Stockwell-Transform based feature representation for detection and assessment of voice disorders

In literature, various time-frequency representation methods were investigated for automatic detection of voice disorders. Stockwell-Transform (S-Transform) provides good time-frequency localization; hence, it may efficiently capture the voice …

verfasst von:
Purva Barche, Krishna Gurugubelli, Anil Kumar Vuppala

13.02.2024

An amalgamation of integrated features with DeepSpeech2 architecture and improved spell corrector for improving Gujarati language ASR system

Automatic Speech Recognition systems that convert language into written text have greatly transformed human–machine interaction. Although these systems have achieved results, in languages building accurate and reliable ASR models for low resource …

verfasst von:
Mohit Dua, Bhavesh Bhagat, Shelza Dua

29.01.2024 | Correction

Correction to: Automated detection system for texture feature based classification on different image datasets using S-transform

verfasst von:
O. Homa Kesav, G. K. Rajini

24.01.2024

A review on speech emotion recognition for late deafened educators in online education

In an online class, students respond to queries initiated by the educator through voice, video, or chat. The educator receives these responses but cannot ascertain the emotion carried in these responses with confidence. This emotional limitation …

verfasst von:
Aparna Vyakaranam, Tomas Maul, Bavani Ramayah