Skip to main content
Erschienen in: GeoInformatica 4/2019

21.06.2019

Continuous decaying of telco big data with data postdiction

verfasst von: Constantinos Costa, Andreas Konstantinidis, Andreas Charalampous, Demetrios Zeinalipour-Yazti, Mohamed F. Mokbel

Erschienen in: GeoInformatica | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present two novel decaying operators for Telco Big Data (TBD), coined TBD-DP and CTBD-DP that are founded on the notion of Data Postdiction. Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which does not exist anymore as it had to be deleted to free up disk space. TBD-DP relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary. Our proposed TBD-DP operator has the following two conceptual phases: (i) in an offline phase, it utilizes a LSTM-based hierarchical ML algorithm to learn a tree of models (coined TBD-DP tree) over time and space; (ii) in an online phase, it uses the TBD-DP tree to recover data within a certain accuracy. Additionally, we provide three decaying focus methods that can be plugged into the operators we propose, namely: (i) FIFO-amnesia, which is based on the time that the tuple was created; (ii) SPATIAL-amnesia, which is based on the cellular tower’s location related with the tuple; and (iii) UNIFORM-amnesia, which picks randomly the tuples to be decayed. Similarly, CTBD-DP enables the decaying of streaming data utilizing the TBD-DP tree to extend and update the stored models. In our experimental setup, we measure the efficiency of the proposed operator using a ∼10GB anonymized real telco network trace. Our experimental results in Tensorflow over HDFS are extremely encouraging as they show that TBD-DP saves an order of magnitude storage space while maintaining a high accuracy on the recovered data. Our experiments also show that CTBD-DP improves the accuracy over streaming data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) Blinkdb: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European conference on computer systems, EuroSys ’13. ACM, New York, pp 29–42. http://doi.acm.org/10.1145/2465351.2465355 Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) Blinkdb: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European conference on computer systems, EuroSys ’13. ACM, New York, pp 29–42. http://​doi.​acm.​org/​10.​1145/​2465351.​2465355
5.
Zurück zum Zitat Bhattacherjee S, Deshpande A, Sussman A (2014) Pstore: an efficient storage framework for managing scientific data. In: Proceedings of the 26th international conference on scientific and statistical database management, SSDBM ’14. ACM, New York, pp 25:1–25:12. http://doi.acm.org/10.1145/2618243.2618268 Bhattacherjee S, Deshpande A, Sussman A (2014) Pstore: an efficient storage framework for managing scientific data. In: Proceedings of the 26th international conference on scientific and statistical database management, SSDBM ’14. ACM, New York, pp 25:1–25:12. http://​doi.​acm.​org/​10.​1145/​2618243.​2618268
6.
Zurück zum Zitat Bhattacherjee S, Chavan A, Huang S, Deshpande A, Parameswaran A (2015) Principles of dataset versioning: exploring the recreation/storage tradeoff. Proc VLDB Endow 8(12):1346–1357CrossRef Bhattacherjee S, Chavan A, Huang S, Deshpande A, Parameswaran A (2015) Principles of dataset versioning: exploring the recreation/storage tradeoff. Proc VLDB Endow 8(12):1346–1357CrossRef
7.
Zurück zum Zitat Bicer T, Yin J, Chiu D, Agrawal G, Schuchardt K (2013) Integrating online compression to accelerate large-scale data analytics applications. In: 2013 IEEE 27th International symposium on parallel & distributed processing (IPDPS). IEEE, pp 1205–1216 Bicer T, Yin J, Chiu D, Agrawal G, Schuchardt K (2013) Integrating online compression to accelerate large-scale data analytics applications. In: 2013 IEEE 27th International symposium on parallel & distributed processing (IPDPS). IEEE, pp 1205–1216
8.
Zurück zum Zitat Bouillet E, Kothari R, Kumar V, Mignet L, Nathan S, Ranganathan A, Turaga DS, Udrea O, Verscheure O (2012) Processing 6 billion cdrs/day: from research to production (experience report). In: Proceedings of the 6th ACM international conference on distributed event-based systems, DEBS ’12. ACM, New York, pp 264–267, https://doi.org/10.1145/2335484.2335513 Bouillet E, Kothari R, Kumar V, Mignet L, Nathan S, Ranganathan A, Turaga DS, Udrea O, Verscheure O (2012) Processing 6 billion cdrs/day: from research to production (experience report). In: Proceedings of the 6th ACM international conference on distributed event-based systems, DEBS ’12. ACM, New York, pp 264–267, https://​doi.​org/​10.​1145/​2335484.​2335513
9.
Zurück zum Zitat Braun L, Etter T, Gasparis G, Kaufmann M, Kossmann D, Widmer D, Avitzur A, Iliopoulos A, Levy E, Liang N (2015) Analytics in motion: high performance event-processing and real-time analytics in the same database. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 251–264, https://doi.org/10.1145/2723372.2742783 Braun L, Etter T, Gasparis G, Kaufmann M, Kossmann D, Widmer D, Avitzur A, Iliopoulos A, Levy E, Liang N (2015) Analytics in motion: high performance event-processing and real-time analytics in the same database. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 251–264, https://​doi.​org/​10.​1145/​2723372.​2742783
10.
Zurück zum Zitat Burtscher M, Ratanaworabhan P (2009) Fpc: a high-speed compressor for double-precision floating-point data. IEEE Trans Comput 58(1):18–31CrossRef Burtscher M, Ratanaworabhan P (2009) Fpc: a high-speed compressor for double-precision floating-point data. IEEE Trans Comput 58(1):18–31CrossRef
13.
Zurück zum Zitat Costa C, Zeinalipour-Yazti D (2018) Telco big data: current state and future directions. In: Proceedings of the 19th IEEE international conference on mobile data management. IEEE Computer Society, ISBN: 978-1-5386-4133-0, June 27, 2018, Aalborg, Denmark, MDM‘18, pp 11–12. https://doi.org/10.1109/MDM.2018.00016 Costa C, Zeinalipour-Yazti D (2018) Telco big data: current state and future directions. In: Proceedings of the 19th IEEE international conference on mobile data management. IEEE Computer Society, ISBN: 978-1-5386-4133-0, June 27, 2018, Aalborg, Denmark, MDM‘18, pp 11–12. https://​doi.​org/​10.​1109/​MDM.​2018.​00016
14.
15.
Zurück zum Zitat Costa C, Chatzimilioudis G, Zeinalipour-Yazti D, Mokbel MF (2017) Towards real-time road traffic analytics using telco big data. In: Proceedings of the international workshop on real-time business intelligence and analytics, BIRTE, Munich, Germany, August 28, 2017, pp 5:1–5:5. http://doi.acm.org/10.1145/3129292.3129296 Costa C, Chatzimilioudis G, Zeinalipour-Yazti D, Mokbel MF (2017) Towards real-time road traffic analytics using telco big data. In: Proceedings of the international workshop on real-time business intelligence and analytics, BIRTE, Munich, Germany, August 28, 2017, pp 5:1–5:5. http://​doi.​acm.​org/​10.​1145/​3129292.​3129296
16.
18.
Zurück zum Zitat Douglis F, Iyengar A (2003) Application-specific delta-encoding via resemblance detection. In: USENIX Annual technical conference, General Track, pp 113–126 Douglis F, Iyengar A (2003) Application-specific delta-encoding via resemblance detection. In: USENIX Annual technical conference, General Track, pp 113–126
21.
Zurück zum Zitat Huang Y, Zhu F, Yuan M, Deng K, Li Y, Ni B, Dai W, Yang Q, Zeng J (2015) Telco churn prediction with big data. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD. ACM, New York, pp 607–618, https://doi.org/10.1145/2723372.2742794 Huang Y, Zhu F, Yuan M, Deng K, Li Y, Ni B, Dai W, Yang Q, Zeng J (2015) Telco churn prediction with big data. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD. ACM, New York, pp 607–618, https://​doi.​org/​10.​1145/​2723372.​2742794
22.
Zurück zum Zitat Iyer AP, Li LE, Stoica I (2015) Celliq: real-time cellular network analytics at scale. In: Proceedings of the 12th USENIX conference on networked systems design and implementation, NSDI’15. USENIX Association, Berkeley, pp 309–322 Iyer AP, Li LE, Stoica I (2015) Celliq: real-time cellular network analytics at scale. In: Proceedings of the 12th USENIX conference on networked systems design and implementation, NSDI’15. USENIX Association, Berkeley, pp 309–322
23.
Zurück zum Zitat Kersten ML (2015) Big data space fungus. In: CIDR 2015, Seventh biennial conference on innovative data systems research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings Kersten ML (2015) Big data space fungus. In: CIDR 2015, Seventh biennial conference on innovative data systems research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings
24.
Zurück zum Zitat Kersten ML, Sidirourgos L (2017) A database system with amnesia. In: CIDR Kersten ML, Sidirourgos L (2017) A database system with amnesia. In: CIDR
27.
Zurück zum Zitat Laiho J, Wacker A, Novosad T (2006) Radio network planning and optimisation for UMTS. Wiley Laiho J, Wacker A, Novosad T (2006) Radio network planning and optimisation for UMTS. Wiley
28.
Zurück zum Zitat Lakshminarasimhan S, Shah N, Ethier S, Klasky S, Latham R, Ross R, Samatova NF (2011) Compressing the incompressible with isabela: in-situ reduction of spatio-temporal data. In: European conference on parallel processing. Springer, pp 366–379 Lakshminarasimhan S, Shah N, Ethier S, Klasky S, Latham R, Ross R, Samatova NF (2011) Compressing the incompressible with isabela: in-situ reduction of spatio-temporal data. In: European conference on parallel processing. Springer, pp 366–379
31.
Zurück zum Zitat Schendel ER, Jin Y, Shah N, Chen J, Chang CS, Ku SH, Ethier S, Klasky S, Latham R, Ross R et al (2012) Isobar preconditioner for effective and high-throughput lossless data compression. In: 2012 IEEE 28th international conference on data engineering. IEEE, pp 138–149 Schendel ER, Jin Y, Shah N, Chen J, Chang CS, Ku SH, Ethier S, Klasky S, Latham R, Ross R et al (2012) Isobar preconditioner for effective and high-throughput lossless data compression. In: 2012 IEEE 28th international conference on data engineering. IEEE, pp 138–149
32.
Zurück zum Zitat Sidirourgos L, Martin, Boncz P (2011) Sciborq: Scientific data management with bounds on runtime and quality. In: Proc. of the Int’l conf. on innovative data systems research (CIDR, pp 296–301) Sidirourgos L, Martin, Boncz P (2011) Sciborq: Scientific data management with bounds on runtime and quality. In: Proc. of the Int’l conf. on innovative data systems research (CIDR, pp 296–301)
33.
Zurück zum Zitat Soroush E, Balazinska M (2013) Time travel in a scientific array database. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, pp 98–109 Soroush E, Balazinska M (2013) Time travel in a scientific array database. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, pp 98–109
35.
Zurück zum Zitat Yan H, Ding S, Suel T (2009) Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th international conference on World wide web. ACM, pp 401–410 Yan H, Ding S, Suel T (2009) Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th international conference on World wide web. ACM, pp 401–410
36.
Zurück zum Zitat You LL, Pollack KT, Long DD, Gopinath K (2011) Presidio: a framework for efficient archival data storage. ACM Trans Storage (TOS) 7(2):6 You LL, Pollack KT, Long DD, Gopinath K (2011) Presidio: a framework for efficient archival data storage. ACM Trans Storage (TOS) 7(2):6
38.
Zurück zum Zitat Zeng K, Agarwal S, Dave A, Armbrust M, Stoica I (2015) G-ola: generalized on-line aggregation for interactive analysis on big data. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 913–918. http://doi.acm.org/10.1145/2723372.2735381 Zeng K, Agarwal S, Dave A, Armbrust M, Stoica I (2015) G-ola: generalized on-line aggregation for interactive analysis on big data. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15. ACM, New York, pp 913–918. http://​doi.​acm.​org/​10.​1145/​2723372.​2735381
39.
Zurück zum Zitat Zhang S, Yang Y, Fan W, Lan L, Yuan M (2014) Oceanrt: real-time analytics over large temporal data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 1099–1102, https://doi.org/10.1145/2588555.2594513 Zhang S, Yang Y, Fan W, Lan L, Yuan M (2014) Oceanrt: real-time analytics over large temporal data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, pp 1099–1102, https://​doi.​org/​10.​1145/​2588555.​2594513
40.
Zurück zum Zitat Zhu F, Luo C, Yuan M, Zhu Y, Zhang Z, Gu T, Deng K, Rao W, Zeng J (2016) City-scale localization with telco big data. In: Proceedings of the 25th ACM international on conference on information and knowledge management, CIKM. ACM, New York, pp 439–448, https://doi.org/10.1145/2983323.2983345 Zhu F, Luo C, Yuan M, Zhu Y, Zhang Z, Gu T, Deng K, Rao W, Zeng J (2016) City-scale localization with telco big data. In: Proceedings of the 25th ACM international on conference on information and knowledge management, CIKM. ACM, New York, pp 439–448, https://​doi.​org/​10.​1145/​2983323.​2983345
Metadaten
Titel
Continuous decaying of telco big data with data postdiction
verfasst von
Constantinos Costa
Andreas Konstantinidis
Andreas Charalampous
Demetrios Zeinalipour-Yazti
Mohamed F. Mokbel
Publikationsdatum
21.06.2019
Verlag
Springer US
Erschienen in
GeoInformatica / Ausgabe 4/2019
Print ISSN: 1384-6175
Elektronische ISSN: 1573-7624
DOI
https://doi.org/10.1007/s10707-019-00364-z

Weitere Artikel der Ausgabe 4/2019

GeoInformatica 4/2019 Zur Ausgabe