Skip to main content

2022 | OriginalPaper | Buchkapitel

Spectrogram Analysis and Text Conversion of Sound Signal for Query Generation to Give Input to Audio Input Device

verfasst von : Kavita Sharma, S. R. N. Reddy

Erschienen in: Artificial Intelligence and Speech Technology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The world is being reshaped by Natural Language Processing. Audio inputs are used in modern electronics. Different types of people supply input to the system in their native language. The system accepts the person's speech, processes it, and responds accordingly. Cooking is a huge problem for a variety of people, including the elderly, those who are confined to their beds, and those who have a specific sort of handicap, such as those who are unable to use their hands and require assistance at all times. To help these people reach their full potential, an audio input device for giving cooking instructions to a cooking system has been proposed in this paper. The gadget takes the user's spoken English language as input, converts it to text using deep learning algorithms, and generates instructions with the help of context-aware words extracted from the recorded audios to send the instruction to the cooking device. To analyses, the audio signal for user authentication is a challenging task due to gaps and pauses between spoken characters, and existing noise in the environment. As a result, the audio input device developed for kitchen systems must analyze the audio input signal to create a more secure environment for authenticated users. As a result, the objective of this paper is to analyze the audio input signal captured in real-time and process the accepted signal to convert into the text to generate instructions for a larger system. The sound signals captured in the real environment are analyzed with Mel spectrogram, MFCC spectrogram, and PRAAT software. The sound signal is processed with the help of a natural language toolkit to generate instructions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wyse, L.: Audio spectrogram representations for processing with convolutional neural networks. arXiv preprint arXiv:1706.09559 (2017) Wyse, L.: Audio spectrogram representations for processing with convolutional neural networks. arXiv preprint arXiv:​1706.​09559 (2017)
2.
Zurück zum Zitat Papadimitriou, I., et al.: Audio-based event detection at different SNR settings using two-dimensional spectrogram magnitude representations. Electronics 9(10), 1593 (2020)CrossRef Papadimitriou, I., et al.: Audio-based event detection at different SNR settings using two-dimensional spectrogram magnitude representations. Electronics 9(10), 1593 (2020)CrossRef
3.
Zurück zum Zitat Dennis, J., Tran, H., Li, H.: Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process. Lett. 18(2), 130–133 (2011)CrossRef Dennis, J., Tran, H., Li, H.: Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process. Lett. 18(2), 130–133 (2011)CrossRef
4.
Zurück zum Zitat Shen, J., et al.: Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2018) Shen, J., et al.: Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2018)
5.
Zurück zum Zitat Hwang, Y., et al.: Mel-spectrogram augmentation for sequence-to-sequence voice conversion. arXiv preprint arXiv:2001.01401 (2020) Hwang, Y., et al.: Mel-spectrogram augmentation for sequence-to-sequence voice conversion. arXiv preprint arXiv:​2001.​01401 (2020)
6.
Zurück zum Zitat Juvela, L., et al.: GELP: GAN-Excited linear prediction for speech synthesis from mel-spectrogram. arXiv preprint arXiv:1904.03976 (2019) Juvela, L., et al.: GELP: GAN-Excited linear prediction for speech synthesis from mel-spectrogram. arXiv preprint arXiv:​1904.​03976 (2019)
7.
Zurück zum Zitat Meghanani, A., Anoop, C.S., Ramakrishnan, A.G.: An exploration of log-mel spectrogram and MFCC features for Alzheimer’s dementia recognition from spontaneous speech. In: 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE (2021) Meghanani, A., Anoop, C.S., Ramakrishnan, A.G.: An exploration of log-mel spectrogram and MFCC features for Alzheimer’s dementia recognition from spontaneous speech. In: 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE (2021)
8.
Zurück zum Zitat Khunarsal, P., Lursinsap, C., Raicharoen, T.: Very short time environmental sound classification based on spectrogram pattern matching. Inf. Sci. 243, 57–74 (2013)CrossRef Khunarsal, P., Lursinsap, C., Raicharoen, T.: Very short time environmental sound classification based on spectrogram pattern matching. Inf. Sci. 243, 57–74 (2013)CrossRef
9.
Zurück zum Zitat Jha, N.K.: An approach towards text to emoticon conversion and vice-versa using NLTK and WordNet. In: 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA). IEEE (2018) Jha, N.K.: An approach towards text to emoticon conversion and vice-versa using NLTK and WordNet. In: 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA). IEEE (2018)
10.
Zurück zum Zitat Kinnunen, T., Lee, K.A., Li, H.: Dimension reduction of the modulation spectrogram for speaker verification. In: Odyssey (2008) Kinnunen, T., Lee, K.A., Li, H.: Dimension reduction of the modulation spectrogram for speaker verification. In: Odyssey (2008)
11.
Zurück zum Zitat Ngo, D., et al.: Sound context classification basing on join learning model and multi-spectrogram features. arXiv preprint arXiv:2005.12779 (2020) Ngo, D., et al.: Sound context classification basing on join learning model and multi-spectrogram features. arXiv preprint arXiv:​2005.​12779 (2020)
12.
Zurück zum Zitat Zheng, W., et al.: CNNs-based acoustic scene classification using multi-spectrogram fusion and label expansions. arXiv preprint arXiv:1809.01543 (2018) Zheng, W., et al.: CNNs-based acoustic scene classification using multi-spectrogram fusion and label expansions. arXiv preprint arXiv:​1809.​01543 (2018)
13.
Zurück zum Zitat Contreras, J.O., Hilles, S., Abubakar, Z.B.: Automated essay scoring with ontology based on text mining and nltk tools. In: 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). IEEE (2018) Contreras, J.O., Hilles, S., Abubakar, Z.B.: Automated essay scoring with ontology based on text mining and nltk tools. In: 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). IEEE (2018)
14.
Zurück zum Zitat Kaneko, T., et al.: CycleGAN-VC3: examining and improving CycleGAN-VCs for mel-spectrogram conversion. arXiv preprint arXiv:2010.11672 (2020) Kaneko, T., et al.: CycleGAN-VC3: examining and improving CycleGAN-VCs for mel-spectrogram conversion. arXiv preprint arXiv:​2010.​11672 (2020)
Metadaten
Titel
Spectrogram Analysis and Text Conversion of Sound Signal for Query Generation to Give Input to Audio Input Device
verfasst von
Kavita Sharma
S. R. N. Reddy
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-95711-7_15

Premium Partner