Skip to main content

2020 | OriginalPaper | Buchkapitel

Big Data Thinning: Knowledge Discovery from Relevant Data

verfasst von : Naji Shehab, Christos Anagnostopoulos

Erschienen in: Convergence of Artificial Intelligence and the Internet of Things

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Using statistical learning theory and machine learning techniques surrounding the principles of Rival Penalised Competitive Learning (RPCL), this chapter proposes a novel approach aiming to aid Big Data Thinning, i.e., analysing only the potential data sub-spaces and not the entire extensive data space. Data scientists, data analysts, IoT applications and Edge-centric services are in need for predictive modelling and analytics. This is achieved by learning from past issued analytics queries and exploiting the analytics query access patterns over the large distributed data-sets revealing the most interested and important sub-spaces for further exploratory analysis. By analysing user queries and respectively mapping them into relatively small-scale predictive local regression models, we can yield higher predictive accuracy. This is done by thinning the data space and freeing it of irrelevant and non-popular data sub-spaces; thus, making use of less training data instances. Experimental results and statistical analysis support the research idea proposed in this work.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Anagnostopoulos, C., Kolomvatsos, K.: Predictive intelligence to the edge through approximate collaborative context reasoning. Appl. Intell. 48(4), 966–991 (2018) Anagnostopoulos, C., Kolomvatsos, K.: Predictive intelligence to the edge through approximate collaborative context reasoning. Appl. Intell. 48(4), 966–991 (2018)
3.
Zurück zum Zitat Anagnostopoulos, C., Triantafillou, P.: Efficient scalable accurate regression queries in In-DBMS analytics. In: IEEE International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 (2017) Anagnostopoulos, C., Triantafillou, P.: Efficient scalable accurate regression queries in In-DBMS analytics. In: IEEE International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 (2017)
4.
Zurück zum Zitat Anagnostopoulos, C., Triantafillou, P.: Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics (2018) Anagnostopoulos, C., Triantafillou, P.: Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics (2018)
5.
Zurück zum Zitat Anagnostopoulos, C., Triantafillou, P.: Query-driven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov. Data 11(4), 47 (2017)CrossRef Anagnostopoulos, C., Triantafillou, P.: Query-driven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov. Data 11(4), 47 (2017)CrossRef
6.
Zurück zum Zitat Anagnostopoulos, C., Savva, F., Triantafillou, P.: Scalable aggregation predictive analytics: a query-driven machine learning approach. Appl. Intell. 48(9), 2546–2567 (2018)CrossRef Anagnostopoulos, C., Savva, F., Triantafillou, P.: Scalable aggregation predictive analytics: a query-driven machine learning approach. Appl. Intell. 48(9), 2546–2567 (2018)CrossRef
7.
Zurück zum Zitat Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07. Society for Industrial and Applied Mathematics, pp. 1027–1035. Philadelphia, PA, USA (2007). ISBN 978-0-898716-24-5. http://dl.acm.org/citation.cfm?id=1283383.1283494 Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07. Society for Industrial and Applied Mathematics, pp. 1027–1035. Philadelphia, PA, USA (2007). ISBN 978-0-898716-24-5. http://​dl.​acm.​org/​citation.​cfm?​id=​1283383.​1283494
13.
Zurück zum Zitat Constandinos X.M. et al.: Socially-oriented edge computing for energy-awareness in IoT architectures. IEEE Commun. (2019) Constandinos X.M. et al.: Socially-oriented edge computing for energy-awareness in IoT architectures. IEEE Commun. (2019)
16.
Zurück zum Zitat Georgios, S. et al.: Elasticity debt analytics exploitation for green mobile cloud computing: an equilibrium model. IEEE Trans. Green Commun. Netw. (2019) Georgios, S. et al.: Elasticity debt analytics exploitation for green mobile cloud computing: an equilibrium model. IEEE Trans. Green Commun. Netw. (2019)
17.
Zurück zum Zitat Grossberg, S.: Adaptive pattern classification and universal recoding: 1. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134 (1976) Grossberg, S.: Adaptive pattern classification and universal recoding: 1. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134 (1976)
19.
Zurück zum Zitat Jun, L. et al.: D2D communication mode selection and resource optimization algorithm with optimal throughput in 5G network. IEEE Access, pp. 25263–25273 (2019) Jun, L. et al.: D2D communication mode selection and resource optimization algorithm with optimal throughput in 5G network. IEEE Access, pp. 25263–25273 (2019)
20.
Zurück zum Zitat Kolomvatsos, K., Anagnostopoulos, C.: Reinforcement machine learning for predictive analytics in smart cities. Informatics 4(3), 16 (2017)CrossRef Kolomvatsos, K., Anagnostopoulos, C.: Reinforcement machine learning for predictive analytics in smart cities. Informatics 4(3), 16 (2017)CrossRef
21.
Zurück zum Zitat Lloyd, S.P.: Least squares quantization in PCM. Information Theory, IEEE Trans. 28(2), 129–137 (1982) Lloyd, S.P.: Least squares quantization in PCM. Information Theory, IEEE Trans. 28(2), 129–137 (1982)
23.
Zurück zum Zitat Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1989). ISBN 0-13-485558-2 Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1989). ISBN 0-13-485558-2
25.
Zurück zum Zitat Rumelhart, D., McClelland, J.: University of California. Parallel Distributed Processing: Foundations. A Bradford book. MIT Press (1986). ISBN 9780262680530 Rumelhart, D., McClelland, J.: University of California. Parallel Distributed Processing: Foundations. A Bradford book. MIT Press (1986). ISBN 9780262680530
26.
Zurück zum Zitat Stelios, P., Evangelos, S., George, M., Constandinos, X.M.: A hyper-box approach using relational databases for large scale machine learning. International conference on telecommunications and multimedia TEMU 2014. IEEE Communications Society proceedings, pp. 69–73, 28–30 July, Crete, Greece Stelios, P., Evangelos, S., George, M., Constandinos, X.M.: A hyper-box approach using relational databases for large scale machine learning. International conference on telecommunications and multimedia TEMU 2014. IEEE Communications Society proceedings, pp. 69–73, 28–30 July, Crete, Greece
28.
Zurück zum Zitat Yannis, N. et al.: Vulnerability assessment as a Service for Fog-Centric Healthcare ICT ecosystems. J. Peer-to-Peer Netw. Appl. Springer (2019) Yannis, N. et al.: Vulnerability assessment as a Service for Fog-Centric Healthcare ICT ecosystems. J. Peer-to-Peer Netw. Appl. Springer (2019)
Metadaten
Titel
Big Data Thinning: Knowledge Discovery from Relevant Data
verfasst von
Naji Shehab
Christos Anagnostopoulos
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-44907-0_11