Skip to main content
Top
Published in:

01-12-2023 | Original Article

Conceptual modeling of big data SPJ operations with Twitter social medium

Authors: Hana Mallek, Faiza Ghozzi, Faiez Gargouri

Published in: Social Network Analysis and Mining | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article explores the conceptual modeling of big data SPJ operations with Twitter social medium, focusing on the use of the MapReduce paradigm to adapt ETL processes for handling massive data. It introduces a formal model for ETL processes and the MapReduce paradigm, detailing the components of the solution. The author presents a case study using Twitter data, demonstrating how to adapt ETL processes to leverage the MapReduce paradigm for parallel and distributed data processing. The work is validated through experiments using Talend for Big Data, showcasing the efficiency and reliability of the proposed SPJ operations. The article concludes by highlighting the potential for future research and comparisons with other big data technologies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Alarabi L, Eldawy A, Alghamdi R, Mokbel MF (2014) TAREEG: a MapReduce-based system for extracting spatial data from OpenStreetMap. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp 83–92 Alarabi L, Eldawy A, Alghamdi R, Mokbel MF (2014) TAREEG: a MapReduce-based system for extracting spatial data from OpenStreetMap. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp 83–92
go back to reference Awiti J, Vaisman AA, Zimányi E (2020) Design and implementation of ETL processes using BPMN and relational algebra. Data Knowl Eng 129:101837CrossRef Awiti J, Vaisman AA, Zimányi E (2020) Design and implementation of ETL processes using BPMN and relational algebra. Data Knowl Eng 129:101837CrossRef
go back to reference Bala M, Boussaid O, Alimazighi Z (2014) P-ETL: Parallel-ETL based on the MapReduce paradigm. In: 2014 IEEE/ACS 11th international conference on computer systems and applications (AICCSA). IEEE, pp 42–49 Bala M, Boussaid O, Alimazighi Z (2014) P-ETL: Parallel-ETL based on the MapReduce paradigm. In: 2014 IEEE/ACS 11th international conference on computer systems and applications (AICCSA). IEEE, pp 42–49
go back to reference Bala M, Boussaid O, Alimazighi Z (2017) A fine-grained distribution approach for ETL processes in big data environments. Data Knowl Eng 111:114–136CrossRef Bala M, Boussaid O, Alimazighi Z (2017) A fine-grained distribution approach for ETL processes in big data environments. Data Knowl Eng 111:114–136CrossRef
go back to reference Bendechache M, Tari AK, Kechadi MT (2019) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34(6):671–689CrossRef Bendechache M, Tari AK, Kechadi MT (2019) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34(6):671–689CrossRef
go back to reference Biswas N, Chattopadhyay S, Mahapatra G, Chatterjee S, Mondal KC (2017) SysML based conceptual ETL process modeling. In: Computational intelligence, communications, and business analytics: first international conference, CICBA 2017, Kolkata, India, March 24–25, 2017, revised selected papers, part II, pp 242–255 Biswas N, Chattopadhyay S, Mahapatra G, Chatterjee S, Mondal KC (2017) SysML based conceptual ETL process modeling. In: Computational intelligence, communications, and business analytics: first international conference, CICBA 2017, Kolkata, India, March 24–25, 2017, revised selected papers, part II, pp 242–255
go back to reference Biswas N, Chattapadhyay S, Mahapatra G, Chatterjee S, Mondal KC (2019) A new approach for conceptual extraction-transformation-loading process modeling. Int Ambient Comput Intell (IJACI) 10(1):30–45CrossRef Biswas N, Chattapadhyay S, Mahapatra G, Chatterjee S, Mondal KC (2019) A new approach for conceptual extraction-transformation-loading process modeling. Int Ambient Comput Intell (IJACI) 10(1):30–45CrossRef
go back to reference Boussahoua M, Boussaid O, Bentayeb F (2017) Logical schema for data warehouse on column-oriented NoSQL databases. In: Database and expert systems applications: 28th international conference, DEXA Boussahoua M, Boussaid O, Bentayeb F (2017) Logical schema for data warehouse on column-oriented NoSQL databases. In: Database and expert systems applications: 28th international conference, DEXA
go back to reference Cuzzocrea A, De Maio C, Fenza G, Loia V, Parente M (2016) OLAP analysis of multidimensional tweet streams for supporting advanced analytics. In: Proceedings of the 31st annual ACM symposium on applied computing, pp 992–999 Cuzzocrea A, De Maio C, Fenza G, Loia V, Parente M (2016) OLAP analysis of multidimensional tweet streams for supporting advanced analytics. In: Proceedings of the 31st annual ACM symposium on applied computing, pp 992–999
go back to reference Dhaouadi A, Bousselmi K, Monnet S, Gammoudi MM, Hammoudi S (2022) A multi-layer modeling for the generation of new architectures for big data warehousing. In: Advanced Information networking and applications: proceedings of the 36th international conference on advanced information networking and applications (AINA-2022), vol 2, pp 204–218 Dhaouadi A, Bousselmi K, Monnet S, Gammoudi MM, Hammoudi S (2022) A multi-layer modeling for the generation of new architectures for big data warehousing. In: Advanced Information networking and applications: proceedings of the 36th international conference on advanced information networking and applications (AINA-2022), vol 2, pp 204–218
go back to reference Di Tria F, Lefons E, Tangorra F (2017) Evaluation of data warehouse design methodologies in the context of big data. In: Big data analytics and knowledge discovery: 19th international conference, DaWaK 2017, Lyon, France, August 28–31, 2017, Proceedings 19. Springer, Berlin, pp 3–18 Di Tria F, Lefons E, Tangorra F (2017) Evaluation of data warehouse design methodologies in the context of big data. In: Big data analytics and knowledge discovery: 19th international conference, DaWaK 2017, Lyon, France, August 28–31, 2017, Proceedings 19. Springer, Berlin, pp 3–18
go back to reference Eckerson W, White C (2003) Evaluating ETL and data integration platforms. TDWI report series Eckerson W, White C (2003) Evaluating ETL and data integration platforms. TDWI report series
go back to reference El Akkaoui Z, Zimányi E (2009) Defining ETL worfklows using BPMN and BPEL. In: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP, pp 41–48 El Akkaoui Z, Zimányi E (2009) Defining ETL worfklows using BPMN and BPEL. In: Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP, pp 41–48
go back to reference El Akkaoui Z, Mazón J N, Vaisman A, Zimányi E (2012) BPMN-based conceptual modeling of ETL processes. In: Data warehousing and knowledge discovery: 14th international conference, DaWaK 2012, Vienna, Austria, September 3–6, 2012. Proceedings 14, pp 1–14 El Akkaoui Z, Mazón J N, Vaisman A, Zimányi E (2012) BPMN-based conceptual modeling of ETL processes. In: Data warehousing and knowledge discovery: 14th international conference, DaWaK 2012, Vienna, Austria, September 3–6, 2012. Proceedings 14, pp 1–14
go back to reference El-Sappagh SHA, Hendawi AMA, El Bastawissy AH (2017) A proposed model for DW ETL processes El-Sappagh SHA, Hendawi AMA, El Bastawissy AH (2017) A proposed model for DW ETL processes
go back to reference Gonzalez-Lopez J, Ventura S, Cano A (2018) Distributed nearest neighbor classification for large-scale multi-label data on spark. Futur Gener Comput Syst 87:66–82CrossRef Gonzalez-Lopez J, Ventura S, Cano A (2018) Distributed nearest neighbor classification for large-scale multi-label data on spark. Futur Gener Comput Syst 87:66–82CrossRef
go back to reference Gupta G, Kumar N, Chhabra I (2020) Optimised transformation algorithm for hadoop data loading in web ETL framework. EAI Endorsed Trans Scalable Inf Syst 7(25):e6–e6 Gupta G, Kumar N, Chhabra I (2020) Optimised transformation algorithm for hadoop data loading in web ETL framework. EAI Endorsed Trans Scalable Inf Syst 7(25):e6–e6
go back to reference Kumar S, Mohbey KK (2022) A review on big data based parallel and distributed approaches of pattern mining. J King Saud Univ Comput Inf Sci 34(5):1639–1662 Kumar S, Mohbey KK (2022) A review on big data based parallel and distributed approaches of pattern mining. J King Saud Univ Comput Inf Sci 34(5):1639–1662
go back to reference Liu X, Thomsen C, Pedersen TB (2013) ETLMR: a highly scalable dimensional ETL framework based on MapReduce. In: Special issue on advances in data warehousing and knowledge discovery, transactions on large-scale data-and knowledge-centered systems VIII, pp 1–31 Liu X, Thomsen C, Pedersen TB (2013) ETLMR: a highly scalable dimensional ETL framework based on MapReduce. In: Special issue on advances in data warehousing and knowledge discovery, transactions on large-scale data-and knowledge-centered systems VIII, pp 1–31
go back to reference Liu X, Thomsen C, Pedersen TB (2014) CloudETL: scalable dimensional ETL for hive. In: Proceedings of the 18th international database engineering and applications symposium, pp 195–206 Liu X, Thomsen C, Pedersen TB (2014) CloudETL: scalable dimensional ETL for hive. In: Proceedings of the 18th international database engineering and applications symposium, pp 195–206
go back to reference Machado GV, Cunha Í, Pereira AC, Oliveira LB (2019) DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J Internet Serv Appl 10:1–15CrossRef Machado GV, Cunha Í, Pereira AC, Oliveira LB (2019) DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J Internet Serv Appl 10:1–15CrossRef
go back to reference Mallek H, Walha A, Ghozzi F, Gargouri F (2014) ETL-web process modeling. In: ASD Advances on decisional systems conference Mallek H, Walha A, Ghozzi F, Gargouri F (2014) ETL-web process modeling. In: ASD Advances on decisional systems conference
go back to reference Mallek H, Ghozzi F, Gargouri F (2020) Towards extract-transform-load operations in a big data context. Int J Sociotechnology Knowl Dev (IJSKD) 12(2):77–95CrossRef Mallek H, Ghozzi F, Gargouri F (2020) Towards extract-transform-load operations in a big data context. Int J Sociotechnology Knowl Dev (IJSKD) 12(2):77–95CrossRef
go back to reference Mallek H, Ghozzi F, Gargouri F (2022) Conversion operation: from semi-structured collection of documents to Column-oriented structure. In: Proceedings of the 22nd international conference on hybrid intelligent systems (HIS 2022) Mallek H, Ghozzi F, Gargouri F (2022) Conversion operation: from semi-structured collection of documents to Column-oriented structure. In: Proceedings of the 22nd international conference on hybrid intelligent systems (HIS 2022)
go back to reference Mallek H, Ghozzi F, Gargouri F (2023) Conceptual modeling of Big Data extraction phase. Int J Hybrid Intell Syst 1–16. (Preprint) Mallek H, Ghozzi F, Gargouri F (2023) Conceptual modeling of Big Data extraction phase. Int J Hybrid Intell Syst 1–16. (Preprint)
go back to reference Mallek H, Ghozzi F, Teste O, Gargouri F (2017). BigDimETL: ETL for multidimensional big data. In: Intelligent systems design and applications: 16th international conference on intelligent systems design and applications (ISDA 2016) held in Porto, Portugal, December 16–18, 2016, pp 935–944 Mallek H, Ghozzi F, Teste O, Gargouri F (2017). BigDimETL: ETL for multidimensional big data. In: Intelligent systems design and applications: 16th international conference on intelligent systems design and applications (ISDA 2016) held in Porto, Portugal, December 16–18, 2016, pp 935–944
go back to reference Moalla I, Nabli A, Hammami M (2022) Data warehouse building to support opinion analysis in social media. Soc Netw Anal Min 12:123CrossRef Moalla I, Nabli A, Hammami M (2022) Data warehouse building to support opinion analysis in social media. Soc Netw Anal Min 12:123CrossRef
go back to reference Muñoz L, Mazon JN, Pardillo J, Trujillo J (2008) Modelling ETL processes of DWs with UML activity diagrams. Mexico, November 9–14, 2008. Proceedings, pp 44–53 Muñoz L, Mazon JN, Pardillo J, Trujillo J (2008) Modelling ETL processes of DWs with UML activity diagrams. Mexico, November 9–14, 2008. Proceedings, pp 44–53
go back to reference Muñoz L, Mazón J-N, Trujillo J (2010) A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses. Inf Softw Technol 52(11):1188–1203CrossRef Muñoz L, Mazón J-N, Trujillo J (2010) A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses. Inf Softw Technol 52(11):1188–1203CrossRef
go back to reference Oliveira B, Belo O (2015) Task clustering on ETL systems—a pattern-oriented approach Oliveira B, Belo O (2015) Task clustering on ETL systems—a pattern-oriented approach
go back to reference Oliveira B, Oliveira Ó, Belo O (2021). Using BPMN for ETL conceptual modelling: a case study. In: DATA, pp 267–274 Oliveira B, Oliveira Ó, Belo O (2021). Using BPMN for ETL conceptual modelling: a case study. In: DATA, pp 267–274
go back to reference Russell N, Van Der Aalst W M, Ter Hofstede AH, Edmond D (2005) Workflow resource patterns: identification, representation and tool support. In: CAiSE, vol 5, pp 216–232 Russell N, Van Der Aalst W M, Ter Hofstede AH, Edmond D (2005) Workflow resource patterns: identification, representation and tool support. In: CAiSE, vol 5, pp 216–232
go back to reference Russell N, Van der Aalst W, Ter Hofstede A, Wohed P (2006) On the suitability of UML 2.0 activity diagrams for business process modelling. In: Conceptual modelling 2006: Proceedings of APCCM2006, pp 95–104 Russell N, Van der Aalst W, Ter Hofstede A, Wohed P (2006) On the suitability of UML 2.0 activity diagrams for business process modelling. In: Conceptual modelling 2006: Proceedings of APCCM2006, pp 95–104
go back to reference Sharma S, Shandilya R, Patnaik S, Mahapatra A (2016) Leading NoSQL models for handling big data: a brief review. Int J Bus Inf Syst 22(1):1–25 Sharma S, Shandilya R, Patnaik S, Mahapatra A (2016) Leading NoSQL models for handling big data: a brief review. Int J Bus Inf Syst 22(1):1–25
go back to reference Song X, Yan X, Yang L (2009) Design ETL metamodel based on UML profile. In: 2009 Second international symposium on knowledge acquisition and modeling, vol 3, pp 69–72 Song X, Yan X, Yang L (2009) Design ETL metamodel based on UML profile. In: 2009 Second international symposium on knowledge acquisition and modeling, vol 3, pp 69–72
go back to reference Swari MHP, Satwika IKS, Handika IPS (2020) Performance analysis of sales big data processing using hadoop and hive in cloud environment. In: 2020 6th Information technology international seminar (ITIS). IEEE Swari MHP, Satwika IKS, Handika IPS (2020) Performance analysis of sales big data processing using hadoop and hive in cloud environment. In: 2020 6th Information technology international seminar (ITIS). IEEE
go back to reference Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL processes in data DWs. In: Conceptual modeling-ER 2003: 22nd international conference on conceptual modeling, Chicago, IL, USA, October 13–16, 2003. Proceedings 22, pp 307–320 Trujillo J, Luján-Mora S (2003) A UML based approach for modeling ETL processes in data DWs. In: Conceptual modeling-ER 2003: 22nd international conference on conceptual modeling, Chicago, IL, USA, October 13–16, 2003. Proceedings 22, pp 307–320
go back to reference Trujillo J, Davis KC, Du X et al (2021) Conceptual modeling in the era of big data and artificial intelligence: research topics and introduction to the special issue. Data Knowl Eng 135:101911CrossRef Trujillo J, Davis KC, Du X et al (2021) Conceptual modeling in the era of big data and artificial intelligence: research topics and introduction to the special issue. Data Knowl Eng 135:101911CrossRef
go back to reference Vassiliadis P, Vagena Z, Skiadopoulos S, Karayannidis N, Sellis T (2001) ARKTOS: towards the modeling, design, control and execution of ETL processes. Inf Syst 26(8):537–561CrossRef Vassiliadis P, Vagena Z, Skiadopoulos S, Karayannidis N, Sellis T (2001) ARKTOS: towards the modeling, design, control and execution of ETL processes. Inf Syst 26(8):537–561CrossRef
go back to reference Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Proceedings of the 5th ACM international workshop on data warehousing and OLAP, pp 14–21 Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Proceedings of the 5th ACM international workshop on data warehousing and OLAP, pp 14–21
go back to reference Walha A, Ghozzi F, Gargouri F (2017) ETL4Social-Data: modeling approach for topic hierarchy. In: KEOD, pp 107–118 Walha A, Ghozzi F, Gargouri F (2017) ETL4Social-Data: modeling approach for topic hierarchy. In: KEOD, pp 107–118
go back to reference Wilkinson K, Simitsis A, Castellanos M, Dayal U (2010) Leveraging business process models for ETL design. In: Conceptual modeling-ER 2010: 29th international conference on conceptual modeling, Vancouver, BC, Canada, November 1–4, 2010. Proceedings 29, pp 15–30 Wilkinson K, Simitsis A, Castellanos M, Dayal U (2010) Leveraging business process models for ETL design. In: Conceptual modeling-ER 2010: 29th international conference on conceptual modeling, Vancouver, BC, Canada, November 1–4, 2010. Proceedings 29, pp 15–30
Metadata
Title
Conceptual modeling of big data SPJ operations with Twitter social medium
Authors
Hana Mallek
Faiza Ghozzi
Faiez Gargouri
Publication date
01-12-2023
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2023
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-023-01112-w

Premium Partner