Skip to main content

2016 | OriginalPaper | Buchkapitel

AScale: Big/Small Data ETL and Real-Time Data Freshness

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we investigate the problem of providing timely results for the Extraction, Transformation and Load (ETL) process and automatic scalability to the entire pipeline including the data warehouse. In general, data loading, transformation and integration are heavy tasks that are performed only periodically during specific offline time windows. Parallel architectures and mechanisms are able to optimize the ETL process by speeding-up each part of the pipeline process as more performance is needed. However, none of them allow the user to specify the ETL time and the framework scales automatically to assure it.
We propose an approach to enable the automatic scalability and freshness of any data warehouse and ETL process in time, suitable for smallData and bigData scenarios. A general framework for testing and implementing the system was developed to provide solutions for each part of the ETL automatic scalability in time. The results show that the proposed system is capable of handling scalability to provide the desired processing speed for both near-real-time results ETL processing.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Fernandez, R.C., Pietzuch, P., Koshy, J., Kreps, J., Lin, D., Narkhede, N., Rao, J., Riccomini, C., Wang, G.: Liquid: unifying nearline and offline big data integration. In: Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA. ACM, January 2015 Fernandez, R.C., Pietzuch, P., Koshy, J., Kreps, J., Lin, D., Narkhede, N., Rao, J., Riccomini, C., Wang, G.: Liquid: unifying nearline and offline big data integration. In: Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA. ACM, January 2015
2.
Zurück zum Zitat Liu, X.: Data warehousing technologies for large-scale and right-time data. Ph.D. thesis, dissertation, Faculty of Engineering and Science at Aalborg University, Denmark (2012) Liu, X.: Data warehousing technologies for large-scale and right-time data. Ph.D. thesis, dissertation, Faculty of Engineering and Science at Aalborg University, Denmark (2012)
3.
Zurück zum Zitat Muñoz, L., Mazón, J.N., Trujillo, J.: Automatic generation of ETL processes from conceptual models. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 33–40. ACM (2009) Muñoz, L., Mazón, J.N., Trujillo, J.: Automatic generation of ETL processes from conceptual models. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 33–40. ACM (2009)
4.
Zurück zum Zitat O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (ssb). Pat (2007) O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (ssb). Pat (2007)
5.
Zurück zum Zitat Simitsis, A., Gupta, C., Wang, S., Dayal, U.: Partitioning real-time ETL workflows (2010) Simitsis, A., Gupta, C., Wang, S., Dayal, U.: Partitioning real-time ETL workflows (2010)
6.
Zurück zum Zitat Vassiliadis, P., Simitsis, A.: Near real time ETL. In: Kozielski, S., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis. Annals of Information Systems, vol. 3, pp. 1–31. Springer, New York (2009)CrossRef Vassiliadis, P., Simitsis, A.: Near real time ETL. In: Kozielski, S., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis. Annals of Information Systems, vol. 3, pp. 1–31. Springer, New York (2009)CrossRef
Metadaten
Titel
AScale: Big/Small Data ETL and Real-Time Data Freshness
verfasst von
Pedro Martins
Maryam Abbasi
Pedro Furtado
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-34099-9_25

Premium Partner