Skip to main content
Top

2020 | OriginalPaper | Chapter

3. Smart Data

Authors : Julián Luengo, Diego García-Gil, Sergio Ramírez-Gallego, Salvador García, Francisco Herrera

Published in: Big Data Preprocessing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights. Big Data is focused on volume, velocity, variety, veracity, and value. The idea of Smart Data is to separate the physical properties of the data (volume, velocity, and variety), from the value and veracity of the data. This transformation is the key to move from Big to Smart Data. Without value and veracity, Big Data becomes an accumulation of raw data that is not accessible in order to extract knowledge. Therefore, Smart Data discovery is tasked to extract useful information from data, in the form of a subset (big or not), which poses enough quality for a successful data mining process. The impact of Smart Data discovery in industry and academia is two-fold: higher quality data mining and reduction of data storage costs. In this chapter we give an insight of the state of Smart Data. Next, we provide a discussion on how to move from Big to Smart Data. We finish with an introduction to Smart Data and its relation with the Internet of Things.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., et al. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58.CrossRef Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., et al. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58.CrossRef
2.
go back to reference Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer Networks, 54(15), 2787–2805.CrossRef Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer Networks, 54(15), 2787–2805.CrossRef
3.
go back to reference Baldassarre, M. T., Caballero, I., Caivano, D., Rivas Garcia, B., & Piattini, M. (2018). From big data to smart data: A data quality perspective. In Proceedings of the 1st ACM SIGSOFT International Workshop on Ensemble-Based Software Engineering (pp. 19–24). New York: ACM.CrossRef Baldassarre, M. T., Caballero, I., Caivano, D., Rivas Garcia, B., & Piattini, M. (2018). From big data to smart data: A data quality perspective. In Proceedings of the 1st ACM SIGSOFT International Workshop on Ensemble-Based Software Engineering (pp. 19–24). New York: ACM.CrossRef
4.
go back to reference Chen, J., Dosyn, D., Lytvyn, V., & Sachenko, A. (2017). Smart data integration by goal driven ontology learning. In Advances in Intelligent Systems and Computing (vol. 529, pp. 283–292). Chen, J., Dosyn, D., Lytvyn, V., & Sachenko, A. (2017). Smart data integration by goal driven ontology learning. In Advances in Intelligent Systems and Computing (vol. 529, pp. 283–292).
5.
go back to reference del Río, S., López, V., Benítez, J. M., & Herrera, F. (2014). On the use of MapReduce for imbalanced big data using random forest. Information Sciences, 285, 112–137.CrossRef del Río, S., López, V., Benítez, J. M., & Herrera, F. (2014). On the use of MapReduce for imbalanced big data using random forest. Information Sciences, 285, 112–137.CrossRef
6.
go back to reference Fan, J., & Fan, Y. (2008). High dimensional classification using features annealed independence rules. Annals of Statistics, 36(6), 2605–2637.MathSciNetCrossRef Fan, J., & Fan, Y. (2008). High dimensional classification using features annealed independence rules. Annals of Statistics, 36(6), 2605–2637.MathSciNetCrossRef
7.
go back to reference Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.CrossRef Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National Science Review, 1(2), 293–314.CrossRef
8.
go back to reference Fernández, A., del Río, S., Chawla, N. V., & Herrera, F. (2017). An insight into imbalanced big data classification: outcomes and challenges. Complex & Intelligent Systems, 3(2), 105–120.CrossRef Fernández, A., del Río, S., Chawla, N. V., & Herrera, F. (2017). An insight into imbalanced big data classification: outcomes and challenges. Complex & Intelligent Systems, 3(2), 105–120.CrossRef
9.
go back to reference Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.CrossRef Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.CrossRef
10.
go back to reference García, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining. Berlin: Springer.CrossRef García, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining. Berlin: Springer.CrossRef
11.
go back to reference García, S., Luengo, J., & Herrera, F. (2016). Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowledge-Based Systems, 98, 1–29.CrossRef García, S., Luengo, J., & Herrera, F. (2016). Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowledge-Based Systems, 98, 1–29.CrossRef
12.
go back to reference García-Gil, D., Luengo, J., García, S., & Herrera, F. (2019). Enabling smart data: Noise filtering in big data classification. Information Sciences, 479, 135–152.CrossRef García-Gil, D., Luengo, J., García, S., & Herrera, F. (2019). Enabling smart data: Noise filtering in big data classification. Information Sciences, 479, 135–152.CrossRef
13.
go back to reference Iafrate, F. (2014). A journey from big data to smart data. Advances in Intelligent Systems and Computing, 261, 25–33.CrossRef Iafrate, F. (2014). A journey from big data to smart data. Advances in Intelligent Systems and Computing, 261, 25–33.CrossRef
14.
go back to reference Lee, J., Bagheri, B., & Kao, H.-A. (2015). A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23.CrossRef Lee, J., Bagheri, B., & Kao, H.-A. (2015). A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23.CrossRef
15.
go back to reference Lenk, A., Bonorden, L., Hellmanns, A., Roedder, N., & Jaehnichen, S. (2015). Towards a taxonomy of standards in smart data. In Proceedings: 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (pp. 1749–1754). Lenk, A., Bonorden, L., Hellmanns, A., Roedder, N., & Jaehnichen, S. (2015). Towards a taxonomy of standards in smart data. In Proceedings: 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (pp. 1749–1754).
17.
go back to reference Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34), 1–7.MathSciNetMATH Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34), 1–7.MathSciNetMATH
18.
go back to reference Monostori, L., Kádár, B., Bauernhansl, T., Kondoh, S., Kumara, S., Reinhart, G., et al. (2016). Cyber-physical systems in manufacturing. CIRP Annals, 65(2), 621–641.CrossRef Monostori, L., Kádár, B., Bauernhansl, T., Kondoh, S., Kumara, S., Reinhart, G., et al. (2016). Cyber-physical systems in manufacturing. CIRP Annals, 65(2), 621–641.CrossRef
19.
go back to reference Peralta, D., del Río, S., Ramírez-Gallego, S., Triguero, I., Benitez, J. M., & Herrera, F. (2016). Evolutionary feature selection for big data classification: A MapReduce approach. Mathematical Problems in Engineering, 2015, 1–11, Article ID 246139 Peralta, D., del Río, S., Ramírez-Gallego, S., Triguero, I., Benitez, J. M., & Herrera, F. (2016). Evolutionary feature selection for big data classification: A MapReduce approach. Mathematical Problems in Engineering, 2015, 1–11, Article ID 246139
20.
go back to reference Raja, P. V., Sivasankar, E., & Pitchiah, R. (2015). Framework for smart health: Toward connected data from big data. Advances in Intelligent Systems and Computing, 343, 423–433.CrossRef Raja, P. V., Sivasankar, E., & Pitchiah, R. (2015). Framework for smart health: Toward connected data from big data. Advances in Intelligent Systems and Computing, 343, 423–433.CrossRef
21.
go back to reference Ramírez-Gallego, S., García, S., Mouriño-Talín, H., Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A., et al. (2016). Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 6(1), 5–21. Ramírez-Gallego, S., García, S., Mouriño-Talín, H., Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A., et al. (2016). Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 6(1), 5–21.
22.
go back to reference Ramírez-Gallego, S., Lastra, I., Martínez-Rego, D., Bolón-Canedo, V., Benítez, J. M., Herrera, F., et al. (2017). Fast-mRMR: Fast minimum redundancy maximum relevance algorithm for high-dimensional big data. International Journal of Intelligent Systems, 32(2), 134–152.CrossRef Ramírez-Gallego, S., Lastra, I., Martínez-Rego, D., Bolón-Canedo, V., Benítez, J. M., Herrera, F., et al. (2017). Fast-mRMR: Fast minimum redundancy maximum relevance algorithm for high-dimensional big data. International Journal of Intelligent Systems, 32(2), 134–152.CrossRef
23.
go back to reference Rastogi, A. K., Narang, N., & Siddiqui, Z. A. (2018). Imbalanced big data classification: A distributed implementation of smote. In Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking (p. 14). New York: ACM. Rastogi, A. K., Narang, N., & Siddiqui, Z. A. (2018). Imbalanced big data classification: A distributed implementation of smote. In Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking (p. 14). New York: ACM.
24.
go back to reference Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637–646.CrossRef Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637–646.CrossRef
25.
go back to reference Tan, M., Tsang, I. W., & Wang, L. (2014). Towards ultrahigh dimensional feature selection for big data. Journal of Machine Learning Research, 15, 1371–1429.MathSciNetMATH Tan, M., Tsang, I. W., & Wang, L. (2014). Towards ultrahigh dimensional feature selection for big data. Journal of Machine Learning Research, 15, 1371–1429.MathSciNetMATH
26.
go back to reference Teng, H., Liu, Y., Liu, A., Xiong, N. N., Cai, Z., Wang, T., et al. (2019). A novel code data dissemination scheme for internet of things through mobile vehicle of smart cities. Future Generation Computer Systems, 94, 351–367.CrossRef Teng, H., Liu, Y., Liu, A., Xiong, N. N., Cai, Z., Wang, T., et al. (2019). A novel code data dissemination scheme for internet of things through mobile vehicle of smart cities. Future Generation Computer Systems, 94, 351–367.CrossRef
27.
go back to reference Triguero, I., del Río, S., López, V., Bacardit, J., Benítez, J. M., & Herrera, F. (2015). ROSEFW-RF: the winner algorithm for the ECBDL14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowledge-Based Systems, 87, 69–79.CrossRef Triguero, I., del Río, S., López, V., Bacardit, J., Benítez, J. M., & Herrera, F. (2015). ROSEFW-RF: the winner algorithm for the ECBDL14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowledge-Based Systems, 87, 69–79.CrossRef
28.
go back to reference Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289. Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289.
29.
go back to reference Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150, 331–345.CrossRef Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: A MapReduce solution for prototype reduction in big data classification. Neurocomputing, 150, 331–345.CrossRef
30.
go back to reference Zanella, A., Bui, N., Castellani, A., Vangelista, L., & Zorzi, M. (2014). Internet of things for smart cities. IEEE Internet of Things Journal, 1(1), 22–32.CrossRef Zanella, A., Bui, N., Castellani, A., Vangelista, L., & Zorzi, M. (2014). Internet of things for smart cities. IEEE Internet of Things Journal, 1(1), 22–32.CrossRef
Metadata
Title
Smart Data
Authors
Julián Luengo
Diego García-Gil
Sergio Ramírez-Gallego
Salvador García
Francisco Herrera
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-39105-8_3

Premium Partner