Skip to main content
Erschienen in:

16.09.2024

Data integration from traditional to big data: main features and comparisons of ETL approaches

verfasst von: Afef Walha, Faiza Ghozzi, Faiez Gargouri

Erschienen in: The Journal of Supercomputing | Ausgabe 19/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data integration combines information from different sources to provide a comprehensive view for making informed business decisions. The ETL (Extract, Transform, and Load) process is essential in data integration. In the past two decades, modeling the ETL process has become a priority for effectively managing information. This paper aims to explore ETL approaches to help researchers and organizational stakeholders overcome challenges, especially in Big Data integration. It offers a comprehensive overview of ETL methods, from traditional to Big Data, and discusses their advantages, limitations, and the primary trends in Big Data integration. The study emphasizes that many technologies have been integrated into ETL steps for data collection, storage, processing, querying, and analysis without proper modeling. Therefore, more generic and customized design modeling of the ETL steps should be carried out to ensure reusability and flexibility. The paper summarizes the exploration of ETL modeling, focusing on Big Data scalability and processing trends. It also identifies critical dilemmas, such as ensuring compatibility across multiple sources and dealing with large volumes of Big Data. Furthermore, it suggests future directions in Big Data integration by leveraging advanced artificial intelligence processing and storage systems to ensure consistency, efficiency, and data integrity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dhaouadi A, Bousselmi K, Gammoudi MM, Monnet S, Hammoudi S (2023) Data warehousing process modeling from classical approaches to new trends: main features and comparisons. Data 7(8):113 Dhaouadi A, Bousselmi K, Gammoudi MM, Monnet S, Hammoudi S (2023) Data warehousing process modeling from classical approaches to new trends: main features and comparisons. Data 7(8):113
3.
Zurück zum Zitat Nwokeji JC, Matovu R (2021) A systematic literature review on Big Data extraction, transformation and loading (ETL). In: Intelligent Computing: Proceedings of the 2021 Computing Conference, vol 2. Springer International Publishing, pp 308-324 Nwokeji JC, Matovu R (2021) A systematic literature review on Big Data extraction, transformation and loading (ETL). In: Intelligent Computing: Proceedings of the 2021 Computing Conference, vol 2. Springer International Publishing, pp 308-324
4.
Zurück zum Zitat Vassakis K, Petrakis E, Kopanakis I (2018) Big data analytics: applications, prospects and challenges. A roadmap from models to technologies, Mobile big data, pp 3–20 Vassakis K, Petrakis E, Kopanakis I (2018) Big data analytics: applications, prospects and challenges. A roadmap from models to technologies, Mobile big data, pp 3–20
5.
Zurück zum Zitat Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18 Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18
6.
Zurück zum Zitat Vassiliadis P, Vagena Z, Skiadopoulos S, Karayannidis N, Sellis T (2001) ARKTOS: towards the modeling, design, control and execution of ETL processes. Inf Syst 26(8):537–561 Vassiliadis P, Vagena Z, Skiadopoulos S, Karayannidis N, Sellis T (2001) ARKTOS: towards the modeling, design, control and execution of ETL processes. Inf Syst 26(8):537–561
7.
Zurück zum Zitat Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: International Workshop on Data Warehousing and OLAP. ACM, pp 14–21 Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: International Workshop on Data Warehousing and OLAP. ACM, pp 14–21
8.
Zurück zum Zitat Vassiliadis P, Simitsis A, Georgantas P, Terrovitis M, Skiadopoulos S (2005) A generic and customizable framework for the design of ETL scenarios. Inf Syst 30(7):492–525 Vassiliadis P, Simitsis A, Georgantas P, Terrovitis M, Skiadopoulos S (2005) A generic and customizable framework for the design of ETL scenarios. Inf Syst 30(7):492–525
9.
Zurück zum Zitat Vassiliadis P, Simitsis A, Baikousi E (2009) A taxonomy of ETL activities. In: International Workshop on Data Warehousing and OLAP (DOLAP). ACM, pp 25–32 Vassiliadis P, Simitsis A, Baikousi E (2009) A taxonomy of ETL activities. In: International Workshop on Data Warehousing and OLAP (DOLAP). ACM, pp 25–32
10.
Zurück zum Zitat Köppen V, Brüggemann B, Berendt B (2011) Designing data integration: the ETL pattern approach. UPGRADE Eur J Inform Prof 3:49–55 Köppen V, Brüggemann B, Berendt B (2011) Designing data integration: the ETL pattern approach. UPGRADE Eur J Inform Prof 3:49–55
11.
Zurück zum Zitat El-Sappagh SHA, Hendawi AMA, El Bastawissy AH (2011) A proposed model for data warehouse ETL processes. J King Saud Univ-Comput Inf Sci 23(2):91–104 El-Sappagh SHA, Hendawi AMA, El Bastawissy AH (2011) A proposed model for data warehouse ETL processes. J King Saud Univ-Comput Inf Sci 23(2):91–104
12.
Zurück zum Zitat Petrović M, Vučković M, Turajlić N, Babarogić S, Aničić N, Marjanović Z (2017) Automating ETL processes using the domain-specific modeling approach. Inf Syst e-Bus Manag 15:425–460 Petrović M, Vučković M, Turajlić N, Babarogić S, Aničić N, Marjanović Z (2017) Automating ETL processes using the domain-specific modeling approach. Inf Syst e-Bus Manag 15:425–460
13.
Zurück zum Zitat Deme A, Buchmann R (2021) A technology-specific modeling method for data ETL processes. In: AMCIS Deme A, Buchmann R (2021) A technology-specific modeling method for data ETL processes. In: AMCIS
14.
Zurück zum Zitat Oliveira B, Belo O (2016) An ontology for describing ETL patterns behavior. In: 5th International Conference on Data Management Technologies and Applications, pp 102–109 Oliveira B, Belo O (2016) An ontology for describing ETL patterns behavior. In: 5th International Conference on Data Management Technologies and Applications, pp 102–109
15.
Zurück zum Zitat Oliveira B, Belo O (2017) Approaching ETL processes specification using a pattern-based ontology. In: Data Management Technologies and Applications; Communications in Computer and Information Science, vol 737. Springer, pp 65–78 Oliveira B, Belo O (2017) Approaching ETL processes specification using a pattern-based ontology. In: Data Management Technologies and Applications; Communications in Computer and Information Science, vol 737. Springer, pp 65–78
16.
Zurück zum Zitat Jacobson L, Booch JRG (2021) The unified modeling language reference manual Jacobson L, Booch JRG (2021) The unified modeling language reference manual
17.
Zurück zum Zitat Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL processes in data warehouses. In: International Conference on Conceptual Modeling. Springer Berlin Heidelberg, pp 307–320 Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL processes in data warehouses. In: International Conference on Conceptual Modeling. Springer Berlin Heidelberg, pp 307–320
18.
Zurück zum Zitat Luján-Mora S, Vassiliadis P, TrujilloJ (2004) Data mapping diagrams for data warehouse design with UML. In: International Conference on Conceptual Modeling. Springer Berlin Heidelberg, pp 191-204 Luján-Mora S, Vassiliadis P, TrujilloJ (2004) Data mapping diagrams for data warehouse design with UML. In: International Conference on Conceptual Modeling. Springer Berlin Heidelberg, pp 191-204
19.
Zurück zum Zitat Song X, Yan X, Yang L (2009) Design ETL metamodel based on UML profile. In: International Symposium on Knowledge Acquisition and Modeling, vol 3. IEEE, pp 69–72 Song X, Yan X, Yang L (2009) Design ETL metamodel based on UML profile. In: International Symposium on Knowledge Acquisition and Modeling, vol 3. IEEE, pp 69–72
20.
Zurück zum Zitat Muñoz L, Mazón, JN, Pardillo J, Trujillo J (2008) Modelling ETL processes of data warehouses with UML activity diagrams. In: OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer Berlin Heidelberg, pp 44–53 Muñoz L, Mazón, JN, Pardillo J, Trujillo J (2008) Modelling ETL processes of data warehouses with UML activity diagrams. In: OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer Berlin Heidelberg, pp 44–53
21.
Zurück zum Zitat Muñoz L, Mazón JN, Trujillo J (2010) A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses. Inf Softw Technol 52(11):1188–1203 Muñoz L, Mazón JN, Trujillo J (2010) A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses. Inf Softw Technol 52(11):1188–1203
22.
Zurück zum Zitat Mallek H, Walha A, Ghozzi F, Gargouri F (2014) ETL-web process modeling. In: Advances on Decisional Systems Conference (ASD) Mallek H, Walha A, Ghozzi F, Gargouri F (2014) ETL-web process modeling. In: Advances on Decisional Systems Conference (ASD)
23.
Zurück zum Zitat Biswas N, Chattopadhyay S, Mahapatra G, Chatterjee S, Mondal K C (2017) SysML based conceptual ETL process modeling. In: Computational Intelligence, Communications, and Business Analytics International Conference (CICBA). Springer Singapore, pp 242–255 Biswas N, Chattopadhyay S, Mahapatra G, Chatterjee S, Mondal K C (2017) SysML based conceptual ETL process modeling. In: Computational Intelligence, Communications, and Business Analytics International Conference (CICBA). Springer Singapore, pp 242–255
24.
Zurück zum Zitat Friedenthal S, Moore A, Steiner R (2014) A practical guide to SysML: the systems modeling language. Morgan Kaufmann Friedenthal S, Moore A, Steiner R (2014) A practical guide to SysML: the systems modeling language. Morgan Kaufmann
25.
Zurück zum Zitat Biswas N, Chattapadhyay S, Mahapatra G, Chatterjee S, Mondal KC (2019) A new approach for conceptual extraction-transformation-loading process modeling. Int J Amb Comput Intell (IJACI) 10(1):30–45 Biswas N, Chattapadhyay S, Mahapatra G, Chatterjee S, Mondal KC (2019) A new approach for conceptual extraction-transformation-loading process modeling. Int J Amb Comput Intell (IJACI) 10(1):30–45
26.
Zurück zum Zitat Chinosi M, Trombetta A (2012) BPMN: an introduction to the standard. Comput Stand Interfaces 34(1):124–134 Chinosi M, Trombetta A (2012) BPMN: an introduction to the standard. Comput Stand Interfaces 34(1):124–134
27.
Zurück zum Zitat Wilkinson K, Simitsis A, Castellanos M, Dayal U (2010) Leveraging business process models for ETL design. In: International Conference on Conceptual Modeling. Springer Berlin Heidelberg, pp 15–30 Wilkinson K, Simitsis A, Castellanos M, Dayal U (2010) Leveraging business process models for ETL design. In: International Conference on Conceptual Modeling. Springer Berlin Heidelberg, pp 15–30
28.
Zurück zum Zitat Nabli A, Bouaziz S, Yangui R, Gargouri F (2015) Two-ETL phases for data warehouse creation: design and implementation. In: Advances in Databases and Information Systems: East European Conference (ADBIS). Springer, pp 138–150 Nabli A, Bouaziz S, Yangui R, Gargouri F (2015) Two-ETL phases for data warehouse creation: design and implementation. In: Advances in Databases and Information Systems: East European Conference (ADBIS). Springer, pp 138–150
29.
Zurück zum Zitat El Akkaoui Z, Zimányi E (2009) Defining ETL worfklows using BPMN and BPEL. In: International workshop on Data warehousing and OLAP. pp 41–48 El Akkaoui Z, Zimányi E (2009) Defining ETL worfklows using BPMN and BPEL. In: International workshop on Data warehousing and OLAP. pp 41–48
30.
Zurück zum Zitat El Akkaoui Z, Mazón JN, Vaisman A, Zimányi E, (2012) BPMN-based conceptual modeling of ETL processes. In: Data Warehousing and Knowledge Discovery (DaWaK, (2012). Springer, Berlin Heidelberg, pp 1–14 El Akkaoui Z, Mazón JN, Vaisman A, Zimányi E, (2012) BPMN-based conceptual modeling of ETL processes. In: Data Warehousing and Knowledge Discovery (DaWaK, (2012). Springer, Berlin Heidelberg, pp 1–14
31.
Zurück zum Zitat El Akkaoui Z, Zimányi E, Mazón JN, Trujillo J (2013) A BPMN-based design and maintenance framework for ETL processes. In J Data Warehous Min (IJDWM) 9(3):46–72 El Akkaoui Z, Zimányi E, Mazón JN, Trujillo J (2013) A BPMN-based design and maintenance framework for ETL processes. In J Data Warehous Min (IJDWM) 9(3):46–72
32.
Zurück zum Zitat El Akkaoui Z, Vaisman AA, Zimányi E (2019) A quality-based ETL design evaluation framework. ICEIS 1:249–257 El Akkaoui Z, Vaisman AA, Zimányi E (2019) A quality-based ETL design evaluation framework. ICEIS 1:249–257
33.
Zurück zum Zitat Oliveira B, Oliveira Ó, Belo O (2021) Using BPMN for ETL conceptual modelling: a case study. In: Data, pp 267–274 Oliveira B, Oliveira Ó, Belo O (2021) Using BPMN for ETL conceptual modelling: a case study. In: Data, pp 267–274
34.
Zurück zum Zitat Awiti J, Vaisman AA, Zimányi E (2020) Design and implementation of ETL processes using BPMN and relational algebra. Data Knowl Eng 129:101–837 Awiti J, Vaisman AA, Zimányi E (2020) Design and implementation of ETL processes using BPMN and relational algebra. Data Knowl Eng 129:101–837
35.
Zurück zum Zitat Oliveira B, Belo O (2012) BPMN patterns for ETL conceptual modelling and validation. In: Foundations of Intelligent Systems International Symposium (ISMIS (2012). Springer, Berlin Heidelberg, pp 445–454 Oliveira B, Belo O (2012) BPMN patterns for ETL conceptual modelling and validation. In: Foundations of Intelligent Systems International Symposium (ISMIS (2012). Springer, Berlin Heidelberg, pp 445–454
36.
Zurück zum Zitat Walha A, Ghozzi F, Gargouri F (2019) From user generated content to social data warehouse: processes, operations and data modelling. Int J Web Eng Technol 14(3):203–230 Walha A, Ghozzi F, Gargouri F (2019) From user generated content to social data warehouse: processes, operations and data modelling. Int J Web Eng Technol 14(3):203–230
37.
Zurück zum Zitat Dhaouadi A, Bousselmi K, Monnet S, Gammoudi MM, Hammoudi S (2022) A multi-layer modeling for the generation of new architectures for big data warehousing. In: International Conference on Advanced Information Networking and Applications. Springer, pp 204–218 Dhaouadi A, Bousselmi K, Monnet S, Gammoudi MM, Hammoudi S (2022) A multi-layer modeling for the generation of new architectures for big data warehousing. In: International Conference on Advanced Information Networking and Applications. Springer, pp 204–218
38.
Zurück zum Zitat Iribarne L, Asensio JA, Padilla N, Criado J (2017) Modeling Big data-based systems through ontological trading. Softw Pract Exp 47(11):1561–1596 Iribarne L, Asensio JA, Padilla N, Criado J (2017) Modeling Big data-based systems through ontological trading. Softw Pract Exp 47(11):1561–1596
39.
Zurück zum Zitat Sahiet D, Asanka PD (2015) ETL framework design for NoSQL databases in dataware housing. Int. J. Res. Comput. Appl. Rob. 3:67–75 Sahiet D, Asanka PD (2015) ETL framework design for NoSQL databases in dataware housing. Int. J. Res. Comput. Appl. Rob. 3:67–75
40.
Zurück zum Zitat Mehmood E, Anees T (2022) Distributed real-time ETL architecture for unstructured big data. Knowl Inf Syst 64(12):419–3445 Mehmood E, Anees T (2022) Distributed real-time ETL architecture for unstructured big data. Knowl Inf Syst 64(12):419–3445
41.
Zurück zum Zitat Mallek H, Ghozzi F, Teste O, Gargouri F (2017) BigDimETL: ETL for multidimensional big data. In: International Conference on Intelligent Systems Design and Applications (ISDA 2016). Springer, pp 935-944 Mallek H, Ghozzi F, Teste O, Gargouri F (2017) BigDimETL: ETL for multidimensional big data. In: International Conference on Intelligent Systems Design and Applications (ISDA 2016). Springer, pp 935-944
42.
Zurück zum Zitat Mallek H, Ghozzi F, Teste O, Gargouri F (2018) BigDimETL with NoSQL database. Procedia Comput Sci 126:798–807 Mallek H, Ghozzi F, Teste O, Gargouri F (2018) BigDimETL with NoSQL database. Procedia Comput Sci 126:798–807
43.
Zurück zum Zitat Mallek H, Ghozzi F, Gargouri F (2020) Towards extract-transform-load operations in a big data context. Int J Sociotechnol Knowl Dev (IJSKD) 12(2):77–95 Mallek H, Ghozzi F, Gargouri F (2020) Towards extract-transform-load operations in a big data context. Int J Sociotechnol Knowl Dev (IJSKD) 12(2):77–95
44.
Zurück zum Zitat Mallek H, Ghozzi F, Gargouri F (2022) Conversion operation: from semi-structured collection of documents to column-oriented structure. In: International Conference on Hybrid Intelligent Systems. Springer Nature Switzerland, Cham, pp 585–594 Mallek H, Ghozzi F, Gargouri F (2022) Conversion operation: from semi-structured collection of documents to column-oriented structure. In: International Conference on Hybrid Intelligent Systems. Springer Nature Switzerland, Cham, pp 585–594
45.
Zurück zum Zitat Gupta G, Kumar N, Chhabra I (2020) Optimised transformation algorithm for hadoop data loading in web ETL framework. EAI Endorsed Trans Scalable Inf Syst 7(25):e6–e6 Gupta G, Kumar N, Chhabra I (2020) Optimised transformation algorithm for hadoop data loading in web ETL framework. EAI Endorsed Trans Scalable Inf Syst 7(25):e6–e6
46.
Zurück zum Zitat Souibgui M, Atigui F, Yahia SB, Si-Said Cherfi S (2020) Business intelligence and analytics: on-demand ETL over document stores. In: Research Challenges in Information Science (RCIS 2020). Springer, pp 556–561 Souibgui M, Atigui F, Yahia SB, Si-Said Cherfi S (2020) Business intelligence and analytics: on-demand ETL over document stores. In: Research Challenges in Information Science (RCIS 2020). Springer, pp 556–561
47.
Zurück zum Zitat Souibgui M, Atigui F, Yahia SB, Cherfi SSS (2022) An embedding driven approach to automatically detect identifiers and references in document stores. Data Knowl Eng 139:102003 Souibgui M, Atigui F, Yahia SB, Cherfi SSS (2022) An embedding driven approach to automatically detect identifiers and references in document stores. Data Knowl Eng 139:102003
48.
Zurück zum Zitat Ali SMF (2018) Next-generation ETL framework to address the challenges posed by big data. In: DOLAP Ali SMF (2018) Next-generation ETL framework to address the challenges posed by big data. In: DOLAP
49.
Zurück zum Zitat Ali SMF, Mey J, Thiele M (2019) Parallelizing user-defined functions in the ETL workflow using orchestration style sheets. Int J Appl Math Comput Sci 29(1):69–79 Ali SMF, Mey J, Thiele M (2019) Parallelizing user-defined functions in the ETL workflow using orchestration style sheets. Int J Appl Math Comput Sci 29(1):69–79
50.
Zurück zum Zitat Pau M, Kapsalis P, Pan Z, Korbakis G, Pellegrino D, Monti A (2022) MATRYCS-a big data architecture for advanced services in the building domain. Energies 15(7):2568 Pau M, Kapsalis P, Pan Z, Korbakis G, Pellegrino D, Monti A (2022) MATRYCS-a big data architecture for advanced services in the building domain. Energies 15(7):2568
51.
Zurück zum Zitat Moalla I, Nabli A, Hammami M (2022) Data warehouse building to support opinion analysis in social media. Soc Netw Anal Min 12(1):123 Moalla I, Nabli A, Hammami M (2022) Data warehouse building to support opinion analysis in social media. Soc Netw Anal Min 12(1):123
52.
Zurück zum Zitat Moalla I, Nabli A, Hammami M (2018) Towards opinions analysis method from social media for multidimensional analysis. In: International Conference on Advances in Mobile Computing and Multimedia, pp 8–14 Moalla I, Nabli A, Hammami M (2018) Towards opinions analysis method from social media for multidimensional analysis. In: International Conference on Advances in Mobile Computing and Multimedia, pp 8–14
53.
Zurück zum Zitat Qaiser A, Farooq MU, Mustafa SMN, Abrar N (2023) Comparative analysis of ETL tools in big data analytics. Pak J Eng Technol 6(1):7–12 Qaiser A, Farooq MU, Mustafa SMN, Abrar N (2023) Comparative analysis of ETL tools in big data analytics. Pak J Eng Technol 6(1):7–12
54.
Zurück zum Zitat Bala M, Boussaid O, Alimazighi Z (2014) P-ETL: parallel-ETL based on the MapReduce paradigm. In: IEEE/ACS International Conference on Computer Systems and Applications (AICCSA). IEEE, pp 42–49 Bala M, Boussaid O, Alimazighi Z (2014) P-ETL: parallel-ETL based on the MapReduce paradigm. In: IEEE/ACS International Conference on Computer Systems and Applications (AICCSA). IEEE, pp 42–49
55.
Zurück zum Zitat Bala M, Boussaid O, Alimazighi Z (2016) Extracting-transforming-loading modeling approach for big data analytics. Int J Decis Support Syst Technol (IJDSST) 8(4):50–69 Bala M, Boussaid O, Alimazighi Z (2016) Extracting-transforming-loading modeling approach for big data analytics. Int J Decis Support Syst Technol (IJDSST) 8(4):50–69
56.
Zurück zum Zitat Bala M, Boussaid O, Alimazighi Z (2017) A fine-grained distribution approach for ETL processes in big data environments. Data Knowl Eng 111:114–136 Bala M, Boussaid O, Alimazighi Z (2017) A fine-grained distribution approach for ETL processes in big data environments. Data Knowl Eng 111:114–136
57.
Zurück zum Zitat Yangui R, Nabli A, Gargouri F (2017) ETL based framework for NoSQL warehousing. In: Information Systems: 14th European, Mediterranean, and Middle Eastern Conference, (EMCIS). Springer, pp 40–53 Yangui R, Nabli A, Gargouri F (2017) ETL based framework for NoSQL warehousing. In: Information Systems: 14th European, Mediterranean, and Middle Eastern Conference, (EMCIS). Springer, pp 40–53
58.
Zurück zum Zitat Walha A, Ghozzi F, Gargouri F (2016) ETL design toward social network opinion analysis. Computer and information science. Springer, Cham, pp 235–249 Walha A, Ghozzi F, Gargouri F (2016) ETL design toward social network opinion analysis. Computer and information science. Springer, Cham, pp 235–249
59.
Zurück zum Zitat Lanza Cruz IL, Berlanga Llavori R (2018) Defining dynamic indicators for social network analysis: a case study in the automotive domain using Twiter Lanza Cruz IL, Berlanga Llavori R (2018) Defining dynamic indicators for social network analysis: a case study in the automotive domain using Twiter
60.
Zurück zum Zitat Ben Kraiem M, Alqarni M, Feki J, Ravat F (2020) OLAP operators for social network analysis. Clust Comput 23:2347–2374 Ben Kraiem M, Alqarni M, Feki J, Ravat F (2020) OLAP operators for social network analysis. Clust Comput 23:2347–2374
61.
Zurück zum Zitat Moulai H, Drias H (2018) From data warehouse to information warehouse: application to social media. In: International Conference on Learning and Optimization Algorithms: Theory and Applications, pp 1–6 Moulai H, Drias H (2018) From data warehouse to information warehouse: application to social media. In: International Conference on Learning and Optimization Algorithms: Theory and Applications, pp 1–6
62.
Zurück zum Zitat Gallinucci E, Golfarelli M, Rizzi S (2015) Advanced topic modeling for social business intelligence. Inf Syst 53:87–106 Gallinucci E, Golfarelli M, Rizzi S (2015) Advanced topic modeling for social business intelligence. Inf Syst 53:87–106
63.
Zurück zum Zitat Kurnia PF (2018) Business intelligence model to analyze social media information. Procedia Comput Sci 135:5–14 Kurnia PF (2018) Business intelligence model to analyze social media information. Procedia Comput Sci 135:5–14
64.
Zurück zum Zitat Gutiérrez-Batista K, Campaña JR, Vila MA, Martin-Bautista MJ (2018) Building a contextual dimension for OLAP using textual data from social networks. Expert Syst Appl 93:118–133 Gutiérrez-Batista K, Campaña JR, Vila MA, Martin-Bautista MJ (2018) Building a contextual dimension for OLAP using textual data from social networks. Expert Syst Appl 93:118–133
65.
Zurück zum Zitat Walha A, Ghozzi F, Gargouri F (2021) Design and execution of ETL process to build topic dimension from user-generated content. In: International Conference on Research Challenges in Information Science. Springer, pp 374–389 Walha A, Ghozzi F, Gargouri F (2021) Design and execution of ETL process to build topic dimension from user-generated content. In: International Conference on Research Challenges in Information Science. Springer, pp 374–389
66.
Zurück zum Zitat Walha A, Ghozzi F, Gargouri F (2024) Extract-transform-load process for recognizing sentiment from user-generated text on social media. In: International Conference on Evaluation of Novel Approaches to Software Engineering. SCITEPRESS, pp 641–648 Walha A, Ghozzi F, Gargouri F (2024) Extract-transform-load process for recognizing sentiment from user-generated text on social media. In: International Conference on Evaluation of Novel Approaches to Software Engineering. SCITEPRESS, pp 641–648
67.
Zurück zum Zitat Martinez-Mosquera D, Luján-Mora S, Recalde H (2017) Conceptual modeling of big data extract processes with UML. In: International Conference on Information Systems and Computer Science (INCISCOS). IEEE, pp 207–211 Martinez-Mosquera D, Luján-Mora S, Recalde H (2017) Conceptual modeling of big data extract processes with UML. In: International Conference on Information Systems and Computer Science (INCISCOS). IEEE, pp 207–211
68.
Zurück zum Zitat Machado GV, Cunha Í, Pereira AC, Oliveira LB (2019) DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J Internet Serv Appl 10:1–15 Machado GV, Cunha Í, Pereira AC, Oliveira LB (2019) DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J Internet Serv Appl 10:1–15
69.
Zurück zum Zitat Raj A, Bosch J, Olsson HH, Wang TJ (2020) Modelling data pipelines. In: Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, pp 13–20 Raj A, Bosch J, Olsson HH, Wang TJ (2020) Modelling data pipelines. In: Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, pp 13–20
70.
Zurück zum Zitat Mallek H, Ghozzi F, Gargouri F (2023) Conceptual modeling of big data extraction phase. Int J Hybrid Intell Syst 19(3,4):167–182 Mallek H, Ghozzi F, Gargouri F (2023) Conceptual modeling of big data extraction phase. Int J Hybrid Intell Syst 19(3,4):167–182
71.
Zurück zum Zitat Mallek H, Ghozzi F, Gargouri F (2023) Conceptual modeling of big data SPJ operations with Twitter social medium. Soc Netw Anal Min 13(1):105 Mallek H, Ghozzi F, Gargouri F (2023) Conceptual modeling of big data SPJ operations with Twitter social medium. Soc Netw Anal Min 13(1):105
72.
Zurück zum Zitat Pan Z, Pan G, Monti A (2022) Semantic-similarity-based schema matching for management of building energy data. Energies 15(23):8894 Pan Z, Pan G, Monti A (2022) Semantic-similarity-based schema matching for management of building energy data. Energies 15(23):8894
73.
Zurück zum Zitat Walha A, Ghozzi F, Gargouri F (2017) ETL4Social-data: modeling approach for topic hierarchy. In: KEOD, pp 107–118 Walha A, Ghozzi F, Gargouri F (2017) ETL4Social-data: modeling approach for topic hierarchy. In: KEOD, pp 107–118
74.
Zurück zum Zitat Hung LP, Alias S (2023) Beyond sentiment analysis: a review of recent trends in text based sentiment analysis and emotion detection. J Adv Comput Intell Intell Inform 27(1):84–95 Hung LP, Alias S (2023) Beyond sentiment analysis: a review of recent trends in text based sentiment analysis and emotion detection. J Adv Comput Intell Intell Inform 27(1):84–95
75.
Zurück zum Zitat Qi Y, Shabrina Z (2023) Sentiment analysis using Twitter data: a comparative application of lexicon-and machine-learning-based approach. Soc Netw Anal Min 13(1):31 Qi Y, Shabrina Z (2023) Sentiment analysis using Twitter data: a comparative application of lexicon-and machine-learning-based approach. Soc Netw Anal Min 13(1):31
76.
Zurück zum Zitat Hajji T, Loukili R, El Hassani I, Masrour T (2023) Optimizations of distributed computing processes on apache spark platform. IAENG Int J Comput Sci 50(2):422–433 Hajji T, Loukili R, El Hassani I, Masrour T (2023) Optimizations of distributed computing processes on apache spark platform. IAENG Int J Comput Sci 50(2):422–433
77.
Zurück zum Zitat Sundarakumar MR, Mahadevan G, Natchadalingam R, Karthikeyan G, Ashok J, Manoharan JS, Velmurugadass P (2023) A comprehensive study and review of tuning the performance on database scalability in Big Data analytics. J Intell Fuzzy Syst 44(3):5231–5255 Sundarakumar MR, Mahadevan G, Natchadalingam R, Karthikeyan G, Ashok J, Manoharan JS, Velmurugadass P (2023) A comprehensive study and review of tuning the performance on database scalability in Big Data analytics. J Intell Fuzzy Syst 44(3):5231–5255
78.
Zurück zum Zitat Biswas N, Mondal KC (2022) Integration of ETL in cloud using spark for streaming data. In: Advanced Techniques for IoT Applications: Proceedings of EAIT 2020. Springer Singapore, pp 172–182 Biswas N, Mondal KC (2022) Integration of ETL in cloud using spark for streaming data. In: Advanced Techniques for IoT Applications: Proceedings of EAIT 2020. Springer Singapore, pp 172–182
79.
Zurück zum Zitat Borra P (2024) Comprehensive survey of amazon web services (AWS): techniques, tools, and best practices for cloud solutions Borra P (2024) Comprehensive survey of amazon web services (AWS): techniques, tools, and best practices for cloud solutions
80.
Zurück zum Zitat Armbrust M, Ghodsi A, Xin R, Zaharia M (2021) Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In: Proceedings of CIDR, vol 8, p 28 Armbrust M, Ghodsi A, Xin R, Zaharia M (2021) Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In: Proceedings of CIDR, vol 8, p 28
81.
Zurück zum Zitat Kumar A, Mishra A, Kumar A (2024) Build multi-cloud modern distributed data warehouses with Azure and AWS. In: Architecting a modern data warehouse for large enterprises. Apress, Berkeley Kumar A, Mishra A, Kumar A (2024) Build multi-cloud modern distributed data warehouses with Azure and AWS. In: Architecting a modern data warehouse for large enterprises. Apress, Berkeley
82.
Zurück zum Zitat Simitsis A, Skiadopoulos S, Vassiliadis P (2023) The history, present, and future of ETL technology. In: DOLAP, pp 3–12 Simitsis A, Skiadopoulos S, Vassiliadis P (2023) The history, present, and future of ETL technology. In: DOLAP, pp 3–12
83.
Zurück zum Zitat Ali A, Naeem S, Anam S, Ahmed MM (2023) A state of art survey for Big Data processing and nosql database architecture. Int J Comput Digit Syst 14(1):1–1 Ali A, Naeem S, Anam S, Ahmed MM (2023) A state of art survey for Big Data processing and nosql database architecture. Int J Comput Digit Syst 14(1):1–1
84.
Zurück zum Zitat Patil R, Boit S, Gudivada V, Nandigam J (2023) A survey of text representation and embedding techniques in nlp. IEEE Access 11:36120–36146 Patil R, Boit S, Gudivada V, Nandigam J (2023) A survey of text representation and embedding techniques in nlp. IEEE Access 11:36120–36146
85.
Zurück zum Zitat Silva MC, Eugénio P, Faria D, Pesquita C (2022) Ontologies and knowledge graphs in oncology research. Cancers 14(8):1906 Silva MC, Eugénio P, Faria D, Pesquita C (2022) Ontologies and knowledge graphs in oncology research. Cancers 14(8):1906
86.
Zurück zum Zitat Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. Electronics 9(3):483 Dang NC, Moreno-García MN, De la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. Electronics 9(3):483
87.
Zurück zum Zitat Raiaan MAK, Mukta MSH, Fatema K, Fahad NM, Sakib S, Mim MMJ, Azam S (2024) A review on large Language models: architectures, applications, taxonomies, open issues and challenges. IEEE Access 12:26839–26874 Raiaan MAK, Mukta MSH, Fatema K, Fahad NM, Sakib S, Mim MMJ, Azam S (2024) A review on large Language models: architectures, applications, taxonomies, open issues and challenges. IEEE Access 12:26839–26874
89.
Zurück zum Zitat Beretta V (2018) Data veracity assessment: enhancing truth discovery using a priori knowledge. In: Computer Science [cs]. IMT Mines Alès Beretta V (2018) Data veracity assessment: enhancing truth discovery using a priori knowledge. In: Computer Science [cs]. IMT Mines Alès
90.
Zurück zum Zitat Nambiar A, Mundra D (2022) An overview of data warehouse and data lake in modern enterprise data management. Big Data Cogn Comput 6(4):132 Nambiar A, Mundra D (2022) An overview of data warehouse and data lake in modern enterprise data management. Big Data Cogn Comput 6(4):132
91.
Zurück zum Zitat Al-amri R, Murugesan RK, Man M, Abdulateef AF, Al-Sharafi MA, Alkahtani AA (2021) A review of machine learning and deep learning techniques for anomaly detection in IoT data. Appl Sci 11(12):5320 Al-amri R, Murugesan RK, Man M, Abdulateef AF, Al-Sharafi MA, Alkahtani AA (2021) A review of machine learning and deep learning techniques for anomaly detection in IoT data. Appl Sci 11(12):5320
92.
Zurück zum Zitat Lambert SL, Davidson BI, LeMay SA (2023) Survey of emerging blockchain technologies for improving the data integrity and auditability of manufacturing bills of materials in enterprise resource planning. J Emerg Technol Account 20(2):119–134 Lambert SL, Davidson BI, LeMay SA (2023) Survey of emerging blockchain technologies for improving the data integrity and auditability of manufacturing bills of materials in enterprise resource planning. J Emerg Technol Account 20(2):119–134
93.
Zurück zum Zitat Ding PMR, Wang S Han S, Zhang D (2023) InsightPilot: an LLM-empowered automated data exploration system. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Singapore. Association for Computational Linguistics, pp 346–352 Ding PMR, Wang S Han S, Zhang D (2023) InsightPilot: an LLM-empowered automated data exploration system. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Singapore. Association for Computational Linguistics, pp 346–352
Metadaten
Titel
Data integration from traditional to big data: main features and comparisons of ETL approaches
verfasst von
Afef Walha
Faiza Ghozzi
Faiez Gargouri
Publikationsdatum
16.09.2024
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 19/2024
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-024-06413-1