Top

Published in:

2020 | OriginalPaper | Chapter

Big Data Thinning: Knowledge Discovery from Relevant Data

Authors : Naji Shehab, Christos Anagnostopoulos

Published in: Convergence of Artificial Intelligence and the Internet of Things

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Using statistical learning theory and machine learning techniques surrounding the principles of Rival Penalised Competitive Learning (RPCL), this chapter proposes a novel approach aiming to aid Big Data Thinning, i.e., analysing only the potential data sub-spaces and not the entire extensive data space. Data scientists, data analysts, IoT applications and Edge-centric services are in need for predictive modelling and analytics. This is achieved by learning from past issued analytics queries and exploiting the analytics query access patterns over the large distributed data-sets revealing the most interested and important sub-spaces for further exploratory analysis. By analysing user queries and respectively mapping them into relatively small-scale predictive local regression models, we can yield higher predictive accuracy. This is done by thinning the data space and freeing it of irrelevant and non-popular data sub-spaces; thus, making use of less training data instances. Experimental results and statistical analysis support the research idea proposed in this work.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Power Domain Based Multiple Access for IoT Deployment: Two-Way Transmission Mode and Performance Analysis

next chapter Optimizing Blockchain Networks with Artificial Intelligence: Towards Efficient and Reliable IoT Applications

Ahalt, S.C., Krishnamurthy, A.K., Chen, P., Melton, D.E.: Competitive learning algorithms for vector quantization. Neural Netw. 3(3), 277–290 (1990). ISSN 0893-6080. https://doi.org/10.1016/0893-6080(90)90071-R

Anagnostopoulos, C., Kolomvatsos, K.: Predictive intelligence to the edge through approximate collaborative context reasoning. Appl. Intell. 48(4), 966–991 (2018)

Anagnostopoulos, C., Triantafillou, P.: Efficient scalable accurate regression queries in In-DBMS analytics. In: IEEE International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 (2017)

Anagnostopoulos, C., Triantafillou, P.: Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics (2018)

Anagnostopoulos, C., Triantafillou, P.: Query-driven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov. Data 11(4), 47 (2017)CrossRef

Anagnostopoulos, C., Savva, F., Triantafillou, P.: Scalable aggregation predictive analytics: a query-driven machine learning approach. Appl. Intell. 48(9), 2546–2567 (2018)CrossRef

Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07. Society for Industrial and Applied Mathematics, pp. 1027–1035. Philadelphia, PA, USA (2007). ISBN 978-0-898716-24-5. http://dl.acm.org/citation.cfm?id=1283383.1283494

Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). ISSN 2150-8097. https://doi.org/10.14778/2180912.2180915

Bohn, R., Short, J.E.: How much information? 2009 report on American consumers, vol. 01 (2009). https://www.researchgate.net/publication/242562463_How_Much_ Information_2009_Report_on_American_Consumers

10.

Bohn, R., Short, J.E.: How much information? 2010 report on enterprise server information, p. 7 (2010). https://www.clds.info/uploads/1/2/0/5/120516768/hmi_ 2010_enterprisereport_jan_2011.pdf

11.

Botoca, C., Budura, G., Miclau, N.: Competitive learning algorithms for data clustering. Facta Univ. Ser. Electron. Energetics 19, 01 (2005). https://doi.org/10.2298/FUEE0602261BCrossRef

12.

Constandinos, X.M., George, M., Jordi, M.B.: Internet of Things (IoT) in 5G Mobile Technologies. Springer International Publishing AG (2016). ISSN 2196-7326. https://doi.org/10.1007/978-3-319-30913-2

13.

Constandinos X.M. et al.: Socially-oriented edge computing for energy-awareness in IoT architectures. IEEE Commun. (2019)

14.

Contandriopoulos, D., Brousselle, A.: Evaluation models and evaluation use. Evaluation 18(1), 61–77 (2012). https://doi.org/10.1177/1356389011430371CrossRef

15.

Desieno, D.: Adding a conscience to competitive learning. In: IEEE 1988 International Conference on Neural Networks, vol. 1, pp. 117–124 (1988). https://doi.org/10.1109/icnn.1988.23839

16.

Georgios, S. et al.: Elasticity debt analytics exploitation for green mobile cloud computing: an equilibrium model. IEEE Trans. Green Commun. Netw. (2019)

17.

Grossberg, S.: Adaptive pattern classification and universal recoding: 1. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134 (1976)

18.

Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011). ISSN 0036-8075. https://doi.org/10.1126/science.1200970

19.

Jun, L. et al.: D2D communication mode selection and resource optimization algorithm with optimal throughput in 5G network. IEEE Access, pp. 25263–25273 (2019)

20.

Kolomvatsos, K., Anagnostopoulos, C.: Reinforcement machine learning for predictive analytics in smart cities. Informatics 4(3), 16 (2017)CrossRef

21.

Lloyd, S.P.: Least squares quantization in PCM. Information Theory, IEEE Trans. 28(2), 129–137 (1982)

22.

Makhoul, L., Rpucos, S., Gish, H.: Vector quantization in speech coding. IEEE Trans. Neural Netw. 73(11), 1551–1558 (1985). https://labrosa.ee.columbia.edu/~dpwe/papers/MakhRG85-vq.pdf

23.

Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1989). ISBN 0-13-485558-2

24.

Nasrabadi, N.M., King, R.A.: Image coding using vector quantization: a review. IEEE Trans. Commun. 36, 957–971 (1988). ISSN 0090-6778. https://doi.org/10.1109/26.3776

25.

Rumelhart, D., McClelland, J.: University of California. Parallel Distributed Processing: Foundations. A Bradford book. MIT Press (1986). ISBN 9780262680530

26.

Stelios, P., Evangelos, S., George, M., Constandinos, X.M.: A hyper-box approach using relational databases for large scale machine learning. International conference on telecommunications and multimedia TEMU 2014. IEEE Communications Society proceedings, pp. 69–73, 28–30 July, Crete, Greece

27.

Xu, L., Krzyzak, A., Oja, E.: Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans. Neural Netw. 4(4), 636–649 (1993). ISSN 1045-9227. https://doi.org/10.1109/72.238318

28.

Yannis, N. et al.: Vulnerability assessment as a Service for Fog-Centric Healthcare ICT ecosystems. J. Peer-to-Peer Netw. Appl. Springer (2019)

Title: Big Data Thinning: Knowledge Discovery from Relevant Data
Authors: Naji Shehab
Christos Anagnostopoulos
Publisher: Springer International Publishing
Book: Convergence of Artificial Intelligence and the Internet of Things
Print ISBN: 978-3-030-44906-3

Electronic ISBN: 978-3-030-44907-0

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-44907-0_11

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner