Skip to main content
Top
Published in: Journal of Intelligent Information Systems 3/2014

01-06-2014

Singer identification based on computational auditory scene analysis and missing feature methods

Authors: Ying Hu, Guizhong Liu

Published in: Journal of Intelligent Information Systems | Issue 3/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A major challenge for the identification of singers from monaural popular music recording is to remove or alleviate the influence of accompaniments. Our system is realized in two stages. In the first stage, we exploit computational auditory scene analysis (CASA) to segregate the singing voice units from a mixture signal. First, the pitch of singing voice is estimated to extract the pitch-based features of each unit in an acoustic vector. These features are then exploited to estimate the binary time-frequency (T-F) masks, where 1 indicates that the corresponding T-F unit is dominated by the singing voice, and 0 indicates otherwise. These regions dominated by the singing voice are considered reliable, and other units are unreliable or missing. Thus the acoustic vector is incomplete. In the second stage, two missing feature methods, the reconstruction of acoustic vector and the marginalization, are used to identify the singer by dealing with the incomplete acoustic vectors. For the reconstruction of acoustic vector, the complete acoustic vector is first reconstructed and then converted to obtain the Gammatone frequency cepstral coefficients (GFCCs), which are further used to identify the singer. For the marginalization, the probabilities that the voice belonging to a certain singer are computed on the basis of only the reliable components. We find that the reconstruction method outperforms the marginalization method, while both methods have significantly good performances, especially at signal-to-accompaniment ratios (SARs) of 0 dB and − 3 dB, in contrast to another system.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bartsch, M.A. (2004). Automatic singer identification in polyphonic music. PhD dissertation, The University of Michigan Bartsch, M.A. (2004). Automatic singer identification in polyphonic music. PhD dissertation, The University of Michigan
go back to reference Bartsch, M.A., & Wakefield, G.H. (2004). Singing voice identification using spectral envelope estimation. IEEE Transactions on Speech and Audio Processing, 12, 100–109.CrossRef Bartsch, M.A., & Wakefield, G.H. (2004). Singing voice identification using spectral envelope estimation. IEEE Transactions on Speech and Audio Processing, 12, 100–109.CrossRef
go back to reference Boersma, P., & Weenink, D. (2005). Praat. Doing phonetics by computer [computer program]. Retrieved 31 Mar 2005. Boersma, P., & Weenink, D. (2005). Praat. Doing phonetics by computer [computer program]. Retrieved 31 Mar 2005.
go back to reference Cai, W., Li, Q., Guan, X. (2011). Automatic singer identification based on auditory features. In 7th int. conf. natural comput. (ICNC) (Vol. 3, pp. 1624–1628). Cai, W., Li, Q., Guan, X. (2011). Automatic singer identification based on auditory features. In 7th int. conf. natural comput. (ICNC) (Vol. 3, pp. 1624–1628).
go back to reference Cano, P., Loscos, A., Bonada, J., De Boer, M., Serra, X. (2000). Voice morphing system for impersonating in karaoke applications. In Proc. ICMC (pp. 109–112). Cano, P., Loscos, A., Bonada, J., De Boer, M., Serra, X. (2000). Voice morphing system for impersonating in karaoke applications. In Proc. ICMC (pp. 109–112).
go back to reference Chang, P. (2009). Pitch oriented automatic singer identification in pop music. In Int. conf. semantic comput. (ICSC) (pp. 161–166). Chang, P. (2009). Pitch oriented automatic singer identification in pop music. In Int. conf. semantic comput. (ICSC) (pp. 161–166).
go back to reference Cooke, M., Green, P., Josifovski, L., Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34, 267–285.CrossRefMATH Cooke, M., Green, P., Josifovski, L., Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34, 267–285.CrossRefMATH
go back to reference Fujihara, H., Goto, M., Kitahara, T., Okuno, H.G. (2010). A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 638–648.CrossRef Fujihara, H., Goto, M., Kitahara, T., Okuno, H.G. (2010). A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 638–648.CrossRef
go back to reference Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2005). Singer identification based on accompaniment sound reduction and reliable frame selection. In Proc. int. soc. music inf. retrieval conf. (ISMIR) (pp. 329–336). Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2005). Singer identification based on accompaniment sound reduction and reliable frame selection. In Proc. int. soc. music inf. retrieval conf. (ISMIR) (pp. 329–336).
go back to reference Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2006). F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and Viterbi search. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP). Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2006). F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and Viterbi search. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP).
go back to reference Hu, Y., & Liu, G. (2011). Dynamic characteristics of musical note for musical instrument classification. In IEEE int. conf. signal process., commun. and comput. (ICSPCC) (pp. 1–6). Hu, Y., & Liu, G. (2011). Dynamic characteristics of musical note for musical instrument classification. In IEEE int. conf. signal process., commun. and comput. (ICSPCC) (pp. 1–6).
go back to reference Hu, Y., & Liu, G. (2013). Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition. Journal of Intelligent Inf. Systems, 40(1), 1–18.CrossRef Hu, Y., & Liu, G. (2013). Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition. Journal of Intelligent Inf. Systems, 40(1), 1–18.CrossRef
go back to reference Jin, Z., & Wang, D.L. (2009). A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 625–638.CrossRef Jin, Z., & Wang, D.L. (2009). A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 625–638.CrossRef
go back to reference Khine, S.Z.K., Nwe, T.L., Li, H. (2008). Exploring perceptual based timbre feature for singer identification. In Computer music modeling and retrieval (CMMR. 2007). Lecture notes in computer science (Vol. 4969, pp. 159–171). Khine, S.Z.K., Nwe, T.L., Li, H. (2008). Exploring perceptual based timbre feature for singer identification. In Computer music modeling and retrieval (CMMR. 2007). Lecture notes in computer science (Vol. 4969, pp. 159–171).
go back to reference Kim, Y.E., & Whitman, B. (2002). Singer identification in popular music recordings using voice coding features. In Proc. int. soc. music inf. retrieval conf. (ISMIR). Kim, Y.E., & Whitman, B. (2002). Singer identification in popular music recordings using voice coding features. In Proc. int. soc. music inf. retrieval conf. (ISMIR).
go back to reference Lagrange, M., Ozerov, A., Vincent, E. (2012). Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning. In Proc. int. soc. music inf. retrieval conf. (ISMIR). Lagrange, M., Ozerov, A., Vincent, E. (2012). Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning. In Proc. int. soc. music inf. retrieval conf. (ISMIR).
go back to reference Li, Y., & Wang, D.L. (2005). Detecting pitch of singing voice in polyphonic audio. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP) (Vol. 3, pp. iii/17–iii/20). Li, Y., & Wang, D.L. (2005). Detecting pitch of singing voice in polyphonic audio. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP) (Vol. 3, pp. iii/17–iii/20).
go back to reference Li, Y., & Wang, D.L. (2007). Separation of singing voice from music accompaniment for monaural recordings. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1475–1487.CrossRef Li, Y., & Wang, D.L. (2007). Separation of singing voice from music accompaniment for monaural recordings. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1475–1487.CrossRef
go back to reference Li, Y., & Wang, D.L. (2009). On the optimality of ideal binary time-frequency masks. Speech Communication, 51, 230–239.CrossRef Li, Y., & Wang, D.L. (2009). On the optimality of ideal binary time-frequency masks. Speech Communication, 51, 230–239.CrossRef
go back to reference Maddage, N.C., Xu, C., Wang, Y. (2004). Singer identification based on vocal and instrumental models. In Proc. int. conf. pattern recognition (ICPR) (pp. 375–378). Maddage, N.C., Xu, C., Wang, Y. (2004). Singer identification based on vocal and instrumental models. In Proc. int. conf. pattern recognition (ICPR) (pp. 375–378).
go back to reference Nwe, T.L., & Li, H. (2008). On fusion of timbre-motivated features for singing voice detection and singer identification. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP) (pp. 2225–2228). Nwe, T.L., & Li, H. (2008). On fusion of timbre-motivated features for singing voice detection and singer identification. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP) (pp. 2225–2228).
go back to reference Raj, B., Seltzer, M.L., Stern, R.M. (2004). Reconstruction of missing features for robust speech recognition. Speech communication, 43, 275–296.CrossRef Raj, B., Seltzer, M.L., Stern, R.M. (2004). Reconstruction of missing features for robust speech recognition. Speech communication, 43, 275–296.CrossRef
go back to reference Reynolds, D.A., Quatieri, T.F., Dunn, R.B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.CrossRef Reynolds, D.A., Quatieri, T.F., Dunn, R.B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.CrossRef
go back to reference Shen, J., Cui, B., Shepherd, J., Tan, K.L. (2006). Towards efficient automated singer identification in large music databases. In Proc. int. ACM SIGIR conf. res. develop. inf. retrieval (Vol. 27, No. 3, pp. 59–66). Shen, J., Cui, B., Shepherd, J., Tan, K.L. (2006). Towards efficient automated singer identification in large music databases. In Proc. int. ACM SIGIR conf. res. develop. inf. retrieval (Vol. 27, No. 3, pp. 59–66).
go back to reference Shen, J., Shepherd, J., Cui, B., Tan, K.L. (2009). A novel framework for efficient automated singer identification in large music databases. ACM Transactions on Information Systems (TOIS), 27, 18.CrossRef Shen, J., Shepherd, J., Cui, B., Tan, K.L. (2009). A novel framework for efficient automated singer identification in large music databases. ACM Transactions on Information Systems (TOIS), 27, 18.CrossRef
go back to reference Sofianos, S., et al. (2012). H-semantics: a hybrid approach to singing voice separation. Journal of the Audio Engineering Society, 60(10), 831–841. Sofianos, S., et al. (2012). H-semantics: a hybrid approach to singing voice separation. Journal of the Audio Engineering Society, 60(10), 831–841.
go back to reference Tsai, W.H., & Lin, H.P. (2010). Popular singer identification based on cepstrum transformation. In Proc. IEEE int. conf. multimedia expo (ICME) (pp. 584–589). Tsai, W.H., & Lin, H.P. (2010). Popular singer identification based on cepstrum transformation. In Proc. IEEE int. conf. multimedia expo (ICME) (pp. 584–589).
go back to reference Tsai, W.H., & Lin, H.P. (2011). Background music removal based on cepstrum transformation for popular singer identification. IEEE Transactions on Audio, Speech, and Language Processing, 19(5), 1196–1205.CrossRef Tsai, W.H., & Lin, H.P. (2011). Background music removal based on cepstrum transformation for popular singer identification. IEEE Transactions on Audio, Speech, and Language Processing, 19(5), 1196–1205.CrossRef
go back to reference Tsai, W.H., & Lee, H.C. (2012). Singer identification based on spoken data in voice charaterization. IEEE Transactions on Audio, Speech, and Language Processing, 20(8), 2291–2300.CrossRef Tsai, W.H., & Lee, H.C. (2012). Singer identification based on spoken data in voice charaterization. IEEE Transactions on Audio, Speech, and Language Processing, 20(8), 2291–2300.CrossRef
go back to reference Wang, D.L. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In P. Divenyi (Ed.), Speech separation by humans and machines (pp. 181–197). Norwell: Kluwer Academic.CrossRef Wang, D.L. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In P. Divenyi (Ed.), Speech separation by humans and machines (pp. 181–197). Norwell: Kluwer Academic.CrossRef
go back to reference Wang, D.L., & Brown, G.J. (2006). Computational auditory scene analysis: Principles, algorithms and applications. Hoboken: Wiley-IEEE Press. Wang, D.L., & Brown, G.J. (2006). Computational auditory scene analysis: Principles, algorithms and applications. Hoboken: Wiley-IEEE Press.
go back to reference Zhao, X., Shao, Y., Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20(5), 1608–1616.CrossRef Zhao, X., Shao, Y., Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20(5), 1608–1616.CrossRef
go back to reference Zwan, P., & Kostek, B. (2008). System for automatic singing voice recognition. Journal of the Audio Engineering Society, Vibrato and Intonation Parameters, 56(9), 710–723. Zwan, P., & Kostek, B. (2008). System for automatic singing voice recognition. Journal of the Audio Engineering Society, Vibrato and Intonation Parameters, 56(9), 710–723.
Metadata
Title
Singer identification based on computational auditory scene analysis and missing feature methods
Authors
Ying Hu
Guizhong Liu
Publication date
01-06-2014
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 3/2014
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0271-6

Other articles of this Issue 3/2014

Journal of Intelligent Information Systems 3/2014 Go to the issue

Premium Partner