Skip to main content
Top

2020 | OriginalPaper | Chapter

Big Data Thinning: Knowledge Discovery from Relevant Data

Authors : Naji Shehab, Christos Anagnostopoulos

Published in: Convergence of Artificial Intelligence and the Internet of Things

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Using statistical learning theory and machine learning techniques surrounding the principles of Rival Penalised Competitive Learning (RPCL), this chapter proposes a novel approach aiming to aid Big Data Thinning, i.e., analysing only the potential data sub-spaces and not the entire extensive data space. Data scientists, data analysts, IoT applications and Edge-centric services are in need for predictive modelling and analytics. This is achieved by learning from past issued analytics queries and exploiting the analytics query access patterns over the large distributed data-sets revealing the most interested and important sub-spaces for further exploratory analysis. By analysing user queries and respectively mapping them into relatively small-scale predictive local regression models, we can yield higher predictive accuracy. This is done by thinning the data space and freeing it of irrelevant and non-popular data sub-spaces; thus, making use of less training data instances. Experimental results and statistical analysis support the research idea proposed in this work.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Anagnostopoulos, C., Kolomvatsos, K.: Predictive intelligence to the edge through approximate collaborative context reasoning. Appl. Intell. 48(4), 966–991 (2018) Anagnostopoulos, C., Kolomvatsos, K.: Predictive intelligence to the edge through approximate collaborative context reasoning. Appl. Intell. 48(4), 966–991 (2018)
3.
go back to reference Anagnostopoulos, C., Triantafillou, P.: Efficient scalable accurate regression queries in In-DBMS analytics. In: IEEE International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 (2017) Anagnostopoulos, C., Triantafillou, P.: Efficient scalable accurate regression queries in In-DBMS analytics. In: IEEE International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 (2017)
4.
go back to reference Anagnostopoulos, C., Triantafillou, P.: Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics (2018) Anagnostopoulos, C., Triantafillou, P.: Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics (2018)
5.
go back to reference Anagnostopoulos, C., Triantafillou, P.: Query-driven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov. Data 11(4), 47 (2017)CrossRef Anagnostopoulos, C., Triantafillou, P.: Query-driven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov. Data 11(4), 47 (2017)CrossRef
6.
go back to reference Anagnostopoulos, C., Savva, F., Triantafillou, P.: Scalable aggregation predictive analytics: a query-driven machine learning approach. Appl. Intell. 48(9), 2546–2567 (2018)CrossRef Anagnostopoulos, C., Savva, F., Triantafillou, P.: Scalable aggregation predictive analytics: a query-driven machine learning approach. Appl. Intell. 48(9), 2546–2567 (2018)CrossRef
7.
go back to reference Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07. Society for Industrial and Applied Mathematics, pp. 1027–1035. Philadelphia, PA, USA (2007). ISBN 978-0-898716-24-5. http://dl.acm.org/citation.cfm?id=1283383.1283494 Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07. Society for Industrial and Applied Mathematics, pp. 1027–1035. Philadelphia, PA, USA (2007). ISBN 978-0-898716-24-5. http://​dl.​acm.​org/​citation.​cfm?​id=​1283383.​1283494
13.
go back to reference Constandinos X.M. et al.: Socially-oriented edge computing for energy-awareness in IoT architectures. IEEE Commun. (2019) Constandinos X.M. et al.: Socially-oriented edge computing for energy-awareness in IoT architectures. IEEE Commun. (2019)
16.
go back to reference Georgios, S. et al.: Elasticity debt analytics exploitation for green mobile cloud computing: an equilibrium model. IEEE Trans. Green Commun. Netw. (2019) Georgios, S. et al.: Elasticity debt analytics exploitation for green mobile cloud computing: an equilibrium model. IEEE Trans. Green Commun. Netw. (2019)
17.
go back to reference Grossberg, S.: Adaptive pattern classification and universal recoding: 1. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134 (1976) Grossberg, S.: Adaptive pattern classification and universal recoding: 1. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134 (1976)
19.
go back to reference Jun, L. et al.: D2D communication mode selection and resource optimization algorithm with optimal throughput in 5G network. IEEE Access, pp. 25263–25273 (2019) Jun, L. et al.: D2D communication mode selection and resource optimization algorithm with optimal throughput in 5G network. IEEE Access, pp. 25263–25273 (2019)
20.
go back to reference Kolomvatsos, K., Anagnostopoulos, C.: Reinforcement machine learning for predictive analytics in smart cities. Informatics 4(3), 16 (2017)CrossRef Kolomvatsos, K., Anagnostopoulos, C.: Reinforcement machine learning for predictive analytics in smart cities. Informatics 4(3), 16 (2017)CrossRef
21.
go back to reference Lloyd, S.P.: Least squares quantization in PCM. Information Theory, IEEE Trans. 28(2), 129–137 (1982) Lloyd, S.P.: Least squares quantization in PCM. Information Theory, IEEE Trans. 28(2), 129–137 (1982)
23.
go back to reference Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1989). ISBN 0-13-485558-2 Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1989). ISBN 0-13-485558-2
25.
go back to reference Rumelhart, D., McClelland, J.: University of California. Parallel Distributed Processing: Foundations. A Bradford book. MIT Press (1986). ISBN 9780262680530 Rumelhart, D., McClelland, J.: University of California. Parallel Distributed Processing: Foundations. A Bradford book. MIT Press (1986). ISBN 9780262680530
26.
go back to reference Stelios, P., Evangelos, S., George, M., Constandinos, X.M.: A hyper-box approach using relational databases for large scale machine learning. International conference on telecommunications and multimedia TEMU 2014. IEEE Communications Society proceedings, pp. 69–73, 28–30 July, Crete, Greece Stelios, P., Evangelos, S., George, M., Constandinos, X.M.: A hyper-box approach using relational databases for large scale machine learning. International conference on telecommunications and multimedia TEMU 2014. IEEE Communications Society proceedings, pp. 69–73, 28–30 July, Crete, Greece
28.
go back to reference Yannis, N. et al.: Vulnerability assessment as a Service for Fog-Centric Healthcare ICT ecosystems. J. Peer-to-Peer Netw. Appl. Springer (2019) Yannis, N. et al.: Vulnerability assessment as a Service for Fog-Centric Healthcare ICT ecosystems. J. Peer-to-Peer Netw. Appl. Springer (2019)
Metadata
Title
Big Data Thinning: Knowledge Discovery from Relevant Data
Authors
Naji Shehab
Christos Anagnostopoulos
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-44907-0_11

Premium Partner