Skip to main content
Top

2021 | OriginalPaper | Chapter

Smart Services Using Voice and Images

Authors : Alexander I. Iliev, Peter L. Stanchev

Published in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVII

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter, we will emphasize on some of the most prominent advances in smart technologies that formulate the smart city ecosystem. Furthermore, we will be highlighting the automation of numerous developments based on the extraction and analysis of digital media, using speech and images. At present, there is a multitude of practical systems used for personalization and recommendation of different media. On the other hand, there are assorted types of services in different areas that are directly benefiting from these advancements. Most of them were created with human-machine interaction methodology in mind, where people had to interact with the machines in various ways. In the past this type of interaction has been completed through the use of conventional interfaces such as a mouse and a keyboard, where the user had to type a response manually, which was in turn recorded by the machine for subsequent analysis. Therefore, in order to simplify these types of interactions and lead to improvement of services, new methodologies must be studied, discovered and developed so as to improve services such as recommendation and personalization services.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Lea, R.: Smart Cities: An Overview of the Technology Trends Driving Smart Cities (2017) Lea, R.: Smart Cities: An Overview of the Technology Trends Driving Smart Cities (2017)
4.
go back to reference Bala, A., et al.: Voice command recognition system based on MFCC and DTW. Int. J. Eng. Sci. Technol. 2(12), 7335–7342 (2010) Bala, A., et al.: Voice command recognition system based on MFCC and DTW. Int. J. Eng. Sci. Technol. 2(12), 7335–7342 (2010)
5.
go back to reference Parameshachari, B.D., Sawan, K.G., Gooneshwaree, H., Tulsirai, T.G.: A study on smart home control system through speech. Int. J. Comput. Appl. 69(19), 30–39 (2013). 0975 – 8887 Parameshachari, B.D., Sawan, K.G., Gooneshwaree, H., Tulsirai, T.G.: A study on smart home control system through speech. Int. J. Comput. Appl. 69(19), 30–39 (2013). 0975 – 8887
6.
go back to reference Alkhawaldeh, R.S.: DGR: gender recognition of human speech using one-dimensional conventional neural network. Hindawi Sci. Program. (2019). Article ID 7213717, 12 pages Alkhawaldeh, R.S.: DGR: gender recognition of human speech using one-dimensional conventional neural network. Hindawi Sci. Program. (2019). Article ID 7213717, 12 pages
7.
go back to reference Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication 116, 56–76 (2020)CrossRef Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication 116, 56–76 (2020)CrossRef
10.
go back to reference Ghazanfar Latif, A.H., Khan, M., Butt, M., Butt, O.: IoT based real-time voice analysis and smart monitoring system for disabled people. In: Asia Pacific Journal of Contemporary Education and Communication Technology, Asia Pacific Institute of Advanced Research (APIAR), vol. 3, no. 2, pp. 227–234 (2017). https://doi.org/10.25275/apjcectv3i2ict5. ISBN (eBook) 978 0 9943656 8 2 | ISSN: 2205-6181 Ghazanfar Latif, A.H., Khan, M., Butt, M., Butt, O.: IoT based real-time voice analysis and smart monitoring system for disabled people. In: Asia Pacific Journal of Contemporary Education and Communication Technology, Asia Pacific Institute of Advanced Research (APIAR), vol. 3, no. 2, pp. 227–234 (2017). https://​doi.​org/​10.​25275/​apjcectv3i2ict5. ISBN (eBook) 978 0 9943656 8 2 | ISSN: 2205-6181
11.
go back to reference Smith, B.: Raspberry Pi Assembly Language RASPBIAN Beginners: Hands on Guide. CreateSpace Independent Publishing Platform (2013) Smith, B.: Raspberry Pi Assembly Language RASPBIAN Beginners: Hands on Guide. CreateSpace Independent Publishing Platform (2013)
12.
go back to reference Kumar, S.S., RangaBabu, T.: Emotion and gender recognition of speech signals using SVM. Emotion 4(3) (2015) Kumar, S.S., RangaBabu, T.: Emotion and gender recognition of speech signals using SVM. Emotion 4(3) (2015)
13.
go back to reference Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002) Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
14.
go back to reference Belin, P., Fillion-Bilodeau, S., Gosselin, F.: The montreal affective voices: a validated set of nonverbal affect bursts for research on auditory affective processing. Behav. Res. Methods 40(2), 531–539 (2008)CrossRef Belin, P., Fillion-Bilodeau, S., Gosselin, F.: The montreal affective voices: a validated set of nonverbal affect bursts for research on auditory affective processing. Behav. Res. Methods 40(2), 531–539 (2008)CrossRef
15.
go back to reference Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM, June 2006 Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM, June 2006
16.
go back to reference Sood, S.K., Mahajan, I.: Wearable IoT sensor based healthcare system for identifying and controlling chikungunya virus. Comput. Ind. 91(2017), 33–44 (2017)CrossRef Sood, S.K., Mahajan, I.: Wearable IoT sensor based healthcare system for identifying and controlling chikungunya virus. Comput. Ind. 91(2017), 33–44 (2017)CrossRef
17.
go back to reference Gope, P., Hwang, T.: BSN-care: a secure IoT-based modern healthcare system using body sensor network. IEEE Sensors J. 16(5), 1368–1376 (2016)CrossRef Gope, P., Hwang, T.: BSN-care: a secure IoT-based modern healthcare system using body sensor network. IEEE Sensors J. 16(5), 1368–1376 (2016)CrossRef
18.
go back to reference Frant, E., Ispas, I., Dragomir, V., Dascalu, M., Zoltan, E., Stoica, I.C.: Voice based emotion recognition with convolutional neural networks for companion robots. Romanian J. Inf. Sci. Technol. 20(3), 222–240 (2017) Frant, E., Ispas, I., Dragomir, V., Dascalu, M., Zoltan, E., Stoica, I.C.: Voice based emotion recognition with convolutional neural networks for companion robots. Romanian J. Inf. Sci. Technol. 20(3), 222–240 (2017)
19.
go back to reference Cowie, R., Cornelius, R.: Describing the emotional states that are expressed in speech. Speech Commun. 40, 5–32 (2003)CrossRef Cowie, R., Cornelius, R.: Describing the emotional states that are expressed in speech. Speech Commun. 40, 5–32 (2003)CrossRef
20.
go back to reference Bhatti, M., Wang, Y., Guan, L.: A neural network approach for human emotion recognition in speech. In: IEEE International Symposium on Circuits and Systems, Vancouver, BC, pp. 181–184 (2004) Bhatti, M., Wang, Y., Guan, L.: A neural network approach for human emotion recognition in speech. In: IEEE International Symposium on Circuits and Systems, Vancouver, BC, pp. 181–184 (2004)
21.
go back to reference Noda, T., Yano, Y., Doki, S., Okuma, S.: Adaptive emotion recognition in speech by feature selection based on KL-divergence. In: IEEE International Conference on Systems, Man, and Cybernetics in Taipei, Taiwan, 8–11 October 2006, pp. 1921–1926 (2006) Noda, T., Yano, Y., Doki, S., Okuma, S.: Adaptive emotion recognition in speech by feature selection based on KL-divergence. In: IEEE International Conference on Systems, Man, and Cybernetics in Taipei, Taiwan, 8–11 October 2006, pp. 1921–1926 (2006)
22.
go back to reference Murray and Arnott: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(i2), 1097–1108 (1993) Murray and Arnott: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(i2), 1097–1108 (1993)
23.
go back to reference Nazia, H., Mahmuda, N.: Sensing emotion from voice jitter. In: SenSys 2018, Shenzhen, China, November 4–7 2018, pp. 359–360 (2018). ISBN 978-1-4503-5952-8 Nazia, H., Mahmuda, N.: Sensing emotion from voice jitter. In: SenSys 2018, Shenzhen, China, November 4–7 2018, pp. 359–360 (2018). ISBN 978-1-4503-5952-8
24.
go back to reference Ganapathy, H.H.S., Mallidi, S.H.: Robust feature extraction using modulation filtering of autoregressive models. IEEE Trans. Audio, Speech, Lang. Process. 22(8), 1285–1295 (2014)CrossRef Ganapathy, H.H.S., Mallidi, S.H.: Robust feature extraction using modulation filtering of autoregressive models. IEEE Trans. Audio, Speech, Lang. Process. 22(8), 1285–1295 (2014)CrossRef
25.
go back to reference Kheder, M.A.D., Bausquet, P.: Dealing with additive noise in speaker recognition systems based on i-vector approach. In: IEEE ICASSP, Canada (2013) Kheder, M.A.D., Bausquet, P.: Dealing with additive noise in speaker recognition systems based on i-vector approach. In: IEEE ICASSP, Canada (2013)
26.
go back to reference Atal, B., Hanauer, S.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50(2), 637–655 (1971)CrossRef Atal, B., Hanauer, S.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50(2), 637–655 (1971)CrossRef
27.
go back to reference Iliev, A.I., Stanchev, P.L.: Smart multifunctional digital content ecosystem using emotion analysis of voice. In: 18th International Conference on Computer Systems and Technologies CompSysTech 2017, Ruse, Bulgaria, 22–24 June 2017, volume 1369, pp. 58–64. ACM (2017). ISBN 978-1-4503-5234-5 Iliev, A.I., Stanchev, P.L.: Smart multifunctional digital content ecosystem using emotion analysis of voice. In: 18th International Conference on Computer Systems and Technologies CompSysTech 2017, Ruse, Bulgaria, 22–24 June 2017, volume 1369, pp. 58–64. ACM (2017). ISBN 978-1-4503-5234-5
28.
go back to reference Iliev, A.: Monograph: Emotion Recognition From Speech. Lambert Academic Publishing (2012) Iliev, A.: Monograph: Emotion Recognition From Speech. Lambert Academic Publishing (2012)
29.
30.
go back to reference Iliev, A.I., Scordilis, M.S.: Spoken emotion recognition using glottal symmetry. EURASIP J. Adv. Signal Process. (2011). Article ID 624575, ISSN 1687-6180 Iliev, A.I., Scordilis, M.S.: Spoken emotion recognition using glottal symmetry. EURASIP J. Adv. Signal Process. (2011). Article ID 624575, ISSN 1687-6180
31.
go back to reference Iliev, A.I., Scordilis, M.S.: Emotion recognition in speech using inter-sentence glottal statistics. In: Proceedings of the 15th International Conference on systems, Signals and Image Processing, IEEE-IWSSIP 2008, Bratislava, Slovakia, 25–28 June 2008, pp. 465–468 (2008) Iliev, A.I., Scordilis, M.S.: Emotion recognition in speech using inter-sentence glottal statistics. In: Proceedings of the 15th International Conference on systems, Signals and Image Processing, IEEE-IWSSIP 2008, Bratislava, Slovakia, 25–28 June 2008, pp. 465–468 (2008)
32.
go back to reference Iliev, A.I., Stanchev, P.L.: Glottal attributes extracted from speech with application to emotion driven smart systems. In: Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018), KDIR, vol. 1, pp. 297–302, Thomson Reuters, Seville, Spain, 18–20 September 2018. ISBN 978-989-758-330-8 Iliev, A.I., Stanchev, P.L.: Glottal attributes extracted from speech with application to emotion driven smart systems. In: Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018), KDIR, vol. 1, pp. 297–302, Thomson Reuters, Seville, Spain, 18–20 September 2018. ISBN 978-989-758-330-8
33.
go back to reference Iliev, A.I., Scordilis, M.S., Papa, J.P., Falcão, A.X.: Spoken emotion recognition through optimum-path forest classification using glottal features. J. Comput. Speech Lang. 24(3), 445–460 (2010). ISSN 0885-2308 Iliev, A.I., Scordilis, M.S., Papa, J.P., Falcão, A.X.: Spoken emotion recognition through optimum-path forest classification using glottal features. J. Comput. Speech Lang. 24(3), 445–460 (2010). ISSN 0885-2308
34.
go back to reference Iliev, A.I., Zhang, Y., Scordilis, M.S.: Spoken emotion classification using ToBI features and GMM. In: Proceedings of the 14th International Workshop on Signals and Image Processing 2007 and the 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services. IEEE-IWSSIP 2007, Maribor, Slovenia, 27–30 June 2007, pp. 495–498 (2007). ISSN 16874722, 16874714 Iliev, A.I., Zhang, Y., Scordilis, M.S.: Spoken emotion classification using ToBI features and GMM. In: Proceedings of the 14th International Workshop on Signals and Image Processing 2007 and the 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services. IEEE-IWSSIP 2007, Maribor, Slovenia, 27–30 June 2007, pp. 495–498 (2007). ISSN 16874722, 16874714
35.
go back to reference Iliev, A.I.: Emotion recognition in speech using inter-sentence time-domain statistics. IJIRSET Int. J. Innov. Res. Sci. Eng. Technol. 5(3), 3245–3254 (2016) Iliev, A.I.: Emotion recognition in speech using inter-sentence time-domain statistics. IJIRSET Int. J. Innov. Res. Sci. Eng. Technol. 5(3), 3245–3254 (2016)
36.
go back to reference Iliev, A.I.: Feature vectors for emotion recognition in speech. In: National Informatics Conference, Sofia, Bulgaria, pp. 225–238 (2016) Iliev, A.I.: Feature vectors for emotion recognition in speech. In: National Informatics Conference, Sofia, Bulgaria, pp. 225–238 (2016)
37.
go back to reference Iliev, A.I.: Content discovery using perceptual automation. In: Proceedings of the 10th International Conference on Management of Digital EcoSystems (MEDES 2018), Tokyo, Japan, 25–28 September 2018, pp. 233–238. ACM (2018). https://doi.org/10.1145/3281375.3281399. ISBN 978-1-4503-5622-0 Iliev, A.I.: Content discovery using perceptual automation. In: Proceedings of the 10th International Conference on Management of Digital EcoSystems (MEDES 2018), Tokyo, Japan, 25–28 September 2018, pp. 233–238. ACM (2018). https://​doi.​org/​10.​1145/​3281375.​3281399. ISBN 978-1-4503-5622-0
38.
go back to reference Lukose, S., Upadhya, S.: Music player based on emotion recognition of voice signals. In: 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), IEEE 2017, pp. 1751–1754 (2017). ISBN 978-1-5090-6106-8 Lukose, S., Upadhya, S.: Music player based on emotion recognition of voice signals. In: 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), IEEE 2017, pp. 1751–1754 (2017). ISBN 978-1-5090-6106-8
39.
go back to reference Stanchev, P.: Using image mining for image retrieval. In: IASTED International Conference Computer Science and Technology, Cancun, Mexico, 19–21 May 2003, pp. 214–218 (2003) Stanchev, P.: Using image mining for image retrieval. In: IASTED International Conference Computer Science and Technology, Cancun, Mexico, 19–21 May 2003, pp. 214–218 (2003)
40.
go back to reference Viana, W.: Using images to extend smart object discovery in an Internet of Things scenario. file:///C:/Users/pstan/Desktop/4057-829-4030-1-10-20181009.pdf Viana, W.: Using images to extend smart object discovery in an Internet of Things scenario. file:///C:/Users/pstan/Desktop/4057-829-4030-1-10-20181009.pdf
41.
go back to reference Stanchev, P., Green Jr., D., Dimitrov, B.: Some issues in the art image database systems. J. Digit. Inf. Manag. 4(4), 227–232 (2006) Stanchev, P., Green Jr., D., Dimitrov, B.: Some issues in the art image database systems. J. Digit. Inf. Manag. 4(4), 227–232 (2006)
42.
go back to reference Stanchev, P., Green Jr., D., Dimitrov, B.: High level color similarity retrieval. Int. J. Inf. Theor. Appl. 10(3), 283–287 (2003) Stanchev, P., Green Jr., D., Dimitrov, B.: High level color similarity retrieval. Int. J. Inf. Theor. Appl. 10(3), 283–287 (2003)
43.
go back to reference Ivanova, K., et al.: Local features in APICAS (analyzing of added value of the descriptors based on MPEG-7 vector quantization). Int. J Comput. Sci. Artif. Intell. 2(4), 23–32 (2012). ISSN: 2226-4450 (online) 2226-4469 (print) Ivanova, K., et al.: Local features in APICAS (analyzing of added value of the descriptors based on MPEG-7 vector quantization). Int. J Comput. Sci. Artif. Intell. 2(4), 23–32 (2012). ISSN: 2226-4450 (online) 2226-4469 (print)
44.
go back to reference Radenski, A., et al.: Big data techniques, systems, applications, and platforms: case studies from academia. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 893–898 (2016) Radenski, A., et al.: Big data techniques, systems, applications, and platforms: case studies from academia. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 893–898 (2016)
45.
go back to reference Ivanova, K., Mitov, I., Stanchev, P., Ein-Dor, P., Vanhoof, K.: Establishing correspondences between attribute spaces and complex concept spaces using meta-PGN classifier. In: Proc. of the 2nd International Conference on Digital Preservation and Presentation of Cultural Heritage, V. Tarnovo, Bulgaria, IMI-BAS, Sofia, pp. 71–77 (2012). ISSN 1314-4006 Ivanova, K., Mitov, I., Stanchev, P., Ein-Dor, P., Vanhoof, K.: Establishing correspondences between attribute spaces and complex concept spaces using meta-PGN classifier. In: Proc. of the 2nd International Conference on Digital Preservation and Presentation of Cultural Heritage, V. Tarnovo, Bulgaria, IMI-BAS, Sofia, pp. 71–77 (2012). ISSN 1314-4006
47.
go back to reference Stanchev, P., Geske, J.: Autonomous cars. History. State of art. Research problems. Springer Communications in Computer and Information Science, vol. 601, pp 1–10 (2016) Stanchev, P., Geske, J.: Autonomous cars. History. State of art. Research problems. Springer Communications in Computer and Information Science, vol. 601, pp 1–10 (2016)
48.
go back to reference Viswanathan, V., Hussein, R.: Applications of image processing and real-time embedded systems in autonomous cars: a short review. Int. J. Image Process. (IJIP) 11(2), 36–49 (2017) Viswanathan, V., Hussein, R.: Applications of image processing and real-time embedded systems in autonomous cars: a short review. Int. J. Image Process. (IJIP) 11(2), 36–49 (2017)
49.
go back to reference Salhi, A., Minaoui, B., Fakir, M.: Robust automatic traffic signs detection using fast polygonal approximation of digital curves. In: 2014 International Conference on Multimedia Computing and Systems (ICMCS), Marrakech, pp. 433–437 (2014) Salhi, A., Minaoui, B., Fakir, M.: Robust automatic traffic signs detection using fast polygonal approximation of digital curves. In: 2014 International Conference on Multimedia Computing and Systems (ICMCS), Marrakech, pp. 433–437 (2014)
50.
go back to reference Amato, G., Carrara, F., Falchi, F., Gennaro, C., Meghini, C., Vairo, C.: Deep learning for decentralized parking lot occupancy detection. Expert Syst. Appl. 72, 327–334 (2017)CrossRef Amato, G., Carrara, F., Falchi, F., Gennaro, C., Meghini, C., Vairo, C.: Deep learning for decentralized parking lot occupancy detection. Expert Syst. Appl. 72, 327–334 (2017)CrossRef
51.
go back to reference de Almeida, P.R.L., Oliveira, L.S., Britto Jr., A.S., Silva Jr., E.J., Koerich, A.L.: PKLot – a robust dataset for parking lot classification. Expert Syst. Appl. 42, 4937–4949 (2015) de Almeida, P.R.L., Oliveira, L.S., Britto Jr., A.S., Silva Jr., E.J., Koerich, A.L.: PKLot – a robust dataset for parking lot classification. Expert Syst. Appl. 42, 4937–4949 (2015)
53.
go back to reference Falchi, F., Gennaro, C., Savino, P., Stanchev, P.: Efficient video stream filtering. IEEE Multimed. 52–61 (2008) Falchi, F., Gennaro, C., Savino, P., Stanchev, P.: Efficient video stream filtering. IEEE Multimed. 52–61 (2008)
54.
go back to reference Shapiro, M.: ‘The choice of reference points in best-match file searching’. Comm. ACM 20(5), 339–343 (1977)CrossRef Shapiro, M.: ‘The choice of reference points in best-match file searching’. Comm. ACM 20(5), 339–343 (1977)CrossRef
Metadata
Title
Smart Services Using Voice and Images
Authors
Alexander I. Iliev
Peter L. Stanchev
Copyright Year
2021
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-62919-2_6

Premium Partner