Top

Journal of Intelligent Information Systems

Published in:

11-02-2019

Effect of speech segment samples selection in stutter block detection and remediation

Authors: Pierre Arbajian, Ayman Hajja, Zbigniew W. Raś, Alicja A. Wieczorkowska

Published in: Journal of Intelligent Information Systems | Issue 2/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Speech remediation by identifying those segments which take away from the substance of the speech content can be performed by identifying portions of speech which may be deleted without diminishing from the speech quality, but rather improving the speech. Speech remediation is important when the speech is disfluent as in the case of stuttered speech. We describe two stuttered speech remediation approaches based on the identification of those segments of speech which, when removed, would enhance speech understandability in terms of both, speech content and speech flow. We adopted two approaches, in the first approach we identify and extract speech segments that have weak semantic significance due to their low relative intensity; we subsequently trained several classifiers using a large set of inherent and derived features which provided a second layer filtering stage. The first approach was effective but required a two step process. In order to streamline the detection and remediation process, we introduced an enhancement which expands the realm of disfluency detection to include a broader range of speech anomalies by eliminating the need for a domain-dependent pre-qualification stage. The results of the new approach offer improved accuracy with enhanced simplicity, flexibility and extensibility.

previous article Density-based unsupervised ensemble learning methods for time series forecasting of aggregated or clustered electricity consumption

next article CBSSD: community-based semantic subgroup discovery

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Ai, O.C., Hariharan, M., Yaacob, S., Chee, L.S. (2012). Classification of speech dysfluencies with MFCC and LPCC features, (Vol. 39 pp. 2157–2165).

Arbajian, P., Hajja, A., Raś, Z.W., Wieczorkowska, A.A. (2017). Segment-Removal based stuttered speech remediation. In International workshop on new frontiers in mining complex patterns (pp. 16–34). Cham: Springer.CrossRef

Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences, vol. 17, no. 1193.

Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.

Chee, L.S., Ai, O.C., Yaacob, S. (2009). Overview of automatic stuttering recognition system. In Proc. International conference on man-machine systems (pp. 1–6) no. october, Batu Ferringhi, Penang Malaysia.

Czyzewski, A., Kaczmarek, A., Kostek, B. (2003). Intelligent processing of stuttered speech. J. Intell. IN Inf. Syst., 21, 143–171.CrossRef

Esmaili, I., Dabanloo, N.J., Vali, M. (2016). Automatic classification of speech dysfluencies in continuous speech based on similarity measures and morphological image processing tools. Biomedical Signal Processing and Control, 23, 104–114.CrossRef

Fook, C.Y., Muthusamy, H., Chee, L.S., Yaacob, S.B., Adom, A.H.B. (2013). Comparison of speech parameterization techniques for the classification of speech disfluencies. In Turkish journal of electrical engineering & computer sciences, vol. 21, no. Sup. 1.CrossRef

Hariharan, M., Chee, L.S., Ai, O.C., Yaacob, S. (2012). Classification of speech dysfluencies using LPC based parameterization techniques, (Vol. 36 pp. 1821–1830).CrossRef

Honal, M., & Schultz, T. (2003). Correction of disfluencies in spontaneous speech using a noisy-channel approach. In Interspeech.

Honal, M., & Schultz, T. (2005). Automatic disfluency removal on recognized spontaneous Speech-Rapid adaptation to speaker dependent disfluencies. In ICASSP (no. 1, pp. 969–972).

Howell, P., Davis, S., Bartrip, J. (2009). The UCLASS archive of stuttered speech, (Vol. 52 pp. 556–569).

Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26. https://doi.org/10.18637/jss.v028.i05.CrossRef

KM, R.K., & Ganesan, S. (2011). Comparison of multidimensional MFCC feature vectors for objective assessment of stuttered disfluencies. International Journal of Advanced Networking Applications, 2(05), 854–860.

Lease, M., Johnson, M., Charniak, E. (2006). Recognizing disfluencies in conversational speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1566–1573.CrossRef

Liu, Y., Shriberg, E., Stolcke, A., Harper, M.P. (2005). Comparing HMM, maximum entropy, and conditional random fields for disfluency detection. In Interspeech (pp. 3313–3316).

Raghavendra, M., & Rajeswari, P. (2016). Determination of disfluencies associated in stuttered speech using MFCC feature extraction, (Vol. 4 pp. 2321–9939).

Ravikumar, K.M., Rajagopal, R., Nagaraj, H.C. (2009). An approach for objective assessment of stuttered speech using MFCC. In The international congress for global science and technology (p. 19).

Ribeiro, M.T., Singh, S., Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM.

Świetlicka, I., Kuniszyk-Jóźkowiak, W., Smołka, E. (2013). Hierarchical ANN system for stuttering identification, (Vol. 27 pp. 228–242).CrossRef

The H2O.ai team. (2015). h2o: Python Interface for H2O Python package version 3.1.0.99999, https://github.com/h2oai/h2o-3.

Winkelmann, R., & Raess, G. (2014). Introducing a web application for labeling, visualizing speech and correcting derived speech signals. In LREC (pp. 4129–4133).

Title: Effect of speech segment samples selection in stutter block detection and remediation
Authors: Pierre Arbajian
Ayman Hajja
Zbigniew W. Raś
Alicja A. Wieczorkowska
Publication date: 11-02-2019
Publisher: Springer US
Published in: Journal of Intelligent Information Systems / Issue 2/2019
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-019-00546-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2019

Ensembles of density estimators for positive-unlabeled learning

StreamPref: a query language for temporal conditional preferences on data streams

Database of speech and facial expressions recorded with optimized face motion capture settings

Semantic similarity aggregators for very short textual expressions: a case study on landmarks and point of interest

CBSSD: community-based semantic subgroup discovery

Density-based unsupervised ensemble learning methods for time series forecasting of aggregated or clustered electricity consumption

Premium Partner