Skip to main content

2025 | OriginalPaper | Buchkapitel

Classification of Khasi Dialects Using Spectrogram Augmentation and Pre-trained Models

verfasst von : Khiakupar Jyndiang, Joyprakash Singh Lairenlakpam

Erschienen in: Advances in Communication, Devices and Networking

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

loading …


Using the pre-trained models, this paper discusses the classification of four Khasi dialects—Sohra, Nongkrem, Mairang, and Maram dialects. Mel-spectrogram images were extracted from speech audio of the above four dialects with time masking augmentation. With pre-trained AlexNet and ResNet18 models, we obtained remarkable outcomes in our dialect spectrogram identification study. In our experiment, we got a decent validation accuracy of 93.58%, 93.25%, and 93.20% by AlexNet with 8, 15, and 25 epochs, respectively. Again, with the same epochs, ResNet18 achieved accuracy rates of 88.23%, 93.55%, and 93.57%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"


Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"


Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe


Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"


Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Zurück zum Zitat Chambers JK, Trudgill P (1998) Dialectology, 2nd edn. Cambridge University Press, CambridgeCrossRef Chambers JK, Trudgill P (1998) Dialectology, 2nd edn. Cambridge University Press, CambridgeCrossRef
Zurück zum Zitat Deka U, Sarma RM, Sarma R (2012) A glimpse of language and culture of North East India, 1st edn Deka U, Sarma RM, Sarma R (2012) A glimpse of language and culture of North East India, 1st edn
Zurück zum Zitat Erhan D, Manzagol P-A, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: van Dyk D, Welling M (eds) Proceedings of the international conference on artificial intelligence and statistics. PMLR, pp 153–160 Erhan D, Manzagol P-A, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: van Dyk D, Welling M (eds) Proceedings of the international conference on artificial intelligence and statistics. PMLR, pp 153–160
Zurück zum Zitat Zissman MA, Gleason TP, Rekart DM, Losiewicz BL (1996) Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In: 1996 IEEE international conference on acoustics, speech, and signal processing, pp 777–780. Zissman MA, Gleason TP, Rekart DM, Losiewicz BL (1996) Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In: 1996 IEEE international conference on acoustics, speech, and signal processing, pp 777–780. https://​doi.​org/​10.​1109/​ICASSP.​1996.​543236
Zurück zum Zitat Tsai W-H, Chang W-W (1999) Chinese dialect identification using an acoustic-phonotactic model Tsai W-H, Chang W-W (1999) Chinese dialect identification using an acoustic-phonotactic model
Zurück zum Zitat Miao X, McLoughlin I (2019) LSTM-TDNN with convolutional front-end for dialect identification in the 2019 multi-genre broadcast challenge. arXiv:1912.09003 Miao X, McLoughlin I (2019) LSTM-TDNN with convolutional front-end for dialect identification in the 2019 multi-genre broadcast challenge. arXiv:​1912.​09003
Zurück zum Zitat Syiem E, Marak CR et al (2014) Ki jait ktien bad ki ktien tnat jong ka Meghalaya, 1st edn. Ri Khasi Book Agency, Shillong Syiem E, Marak CR et al (2014) Ki jait ktien bad ki ktien tnat jong ka Meghalaya, 1st edn. Ri Khasi Book Agency, Shillong
Zurück zum Zitat Padi S, Sadjadi SO, Manocha D, Sriram RD (2021) Improved speech emotion recognition using transfer learning and spectrogram augmentation. arXiv:2108.02510 Padi S, Sadjadi SO, Manocha D, Sriram RD (2021) Improved speech emotion recognition using transfer learning and spectrogram augmentation. arXiv:​2108.​02510
Zurück zum Zitat Hwang Y, Cho H, Yang H, Won D-O, Oh I, Lee S-W (2020) Mel-spectrogram augmentation for sequence to sequence voice conversion. arXiv:2001.01401 Hwang Y, Cho H, Yang H, Won D-O, Oh I, Lee S-W (2020) Mel-spectrogram augmentation for sequence to sequence voice conversion. arXiv:​2001.​01401
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Neural Inf Process Syst 25 Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Neural Inf Process Syst 25
Classification of Khasi Dialects Using Spectrogram Augmentation and Pre-trained Models
verfasst von
Khiakupar Jyndiang
Joyprakash Singh Lairenlakpam
Springer Nature Singapore