Skip to main content
Erschienen in: Computing 10/2020

08.07.2020 | Regular Paper

IoT streaming data integration from multiple sources

verfasst von: Doan Quang Tu, A. S. M. Kayes, Wenny Rahayu, Kinh Nguyen

Erschienen in: Computing | Ausgabe 10/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Internet of Things (IoT) has recently received considerable interest due to the development of smart technologies in today’s interconnected world. With the rapid advancement in Internet technologies and the proliferation of IoT sensors, myriad systems and applications generate data of a massive volume, variety and velocity which traditional databases and systems are unable to manage effectively. Many organizations need to deal with these massive datasets that encounter different types of data (e.g., IoT streaming data, static data) in different formats (e.g., structured, semi-structured) coming from multiple sources. Several data integration mechanisms have been designed to process mostly static data. Unfortunately, these techniques are not able to deal with and integrate IoT streaming datasets from multiple sources. In this paper, we identify the challenges of IoT Streaming Data Integration (ISDI) and present a formal approach for the real-time integration of such IoT streaming datasets. We address one of the important issues of timing conflict/alignment among streaming data coming from multiple sources. A generic window-based ISDI approach is proposed to deal with IoT data in different formats and algorithms are developed to integrate IoT streaming data from multiple sources. In particular, we extend the basic windowing algorithm for real-time data integration and to deal with the timing alignment issue. We also introduce a de-duplication algorithm to deal with data redundancy and to demonstrate the useful fragments of the integrated data. We conduct several sets of experiments and quantify the performance of our proposed window-based approach. In particular, we compare our local experimental results with a real setup for streaming data, using Apache Spark. The results of the experiments, which are performed on several IoT datasets, show the efficiency of our proposed solution in terms of processing time. The results are also used to provide an integrated data view to the users.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Tu DQ, Kayes A, Rahayu W, Nguyen K (2019) Isdi: A new window-based framework for integrating iot streaming data from multiple sources. In: International conference on advanced information networking and applications. Springer, New York, pp 498–511 Tu DQ, Kayes A, Rahayu W, Nguyen K (2019) Isdi: A new window-based framework for integrating iot streaming data from multiple sources. In: International conference on advanced information networking and applications. Springer, New York, pp 498–511
2.
Zurück zum Zitat Zikopoulos P, Eaton C et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York Zikopoulos P, Eaton C et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York
3.
Zurück zum Zitat Chen J, Chen Y, Du X, Li C, Lu J, Zhao S et al (2013) Big data challenge: a data management perspective. Front Comput Sci 7(2):157–164MathSciNet Chen J, Chen Y, Du X, Li C, Lu J, Zhao S et al (2013) Big data challenge: a data management perspective. Front Comput Sci 7(2):157–164MathSciNet
4.
Zurück zum Zitat Harris GJ, Rago SA, Williams TH (2004) Distributed storage resource management in a storage area network. US Patent 6,826,580 Harris GJ, Rago SA, Williams TH (2004) Distributed storage resource management in a storage area network. US Patent 6,826,580
5.
Zurück zum Zitat Chakravarthy SK, Sudhakar N, Reddy ES, Subramanian DV, Shankar P (2018) Dimension reduction and storage optimization techniques for distributed and big data cluster environment. In: Soft computing and medical bioinformatics. Springer, New York, pp 47–54 Chakravarthy SK, Sudhakar N, Reddy ES, Subramanian DV, Shankar P (2018) Dimension reduction and storage optimization techniques for distributed and big data cluster environment. In: Soft computing and medical bioinformatics. Springer, New York, pp 47–54
6.
Zurück zum Zitat McNeill N, Kardes H, Borthwick A (2012) Dynamic record blocking: efficient linking of massive databases in mapreduce. In: Proceedings of the 10th international workshop on quality in databases (QDB), pp 1–7 McNeill N, Kardes H, Borthwick A (2012) Dynamic record blocking: efficient linking of massive databases in mapreduce. In: Proceedings of the 10th international workshop on quality in databases (QDB), pp 1–7
7.
Zurück zum Zitat Hassanzadeh O, Chiang F, Lee HC, Miller RJ (2009) Framework for evaluating clustering algorithms in duplicate detection. Proc VLDB Endow 2(1):1282–1293 Hassanzadeh O, Chiang F, Lee HC, Miller RJ (2009) Framework for evaluating clustering algorithms in duplicate detection. Proc VLDB Endow 2(1):1282–1293
8.
Zurück zum Zitat Marinier P, Anepu BM, Pelletier G, Olesen RL (2015) Maintaining time alignment with multiple uplink carriers. US Patent 8,934,459 Marinier P, Anepu BM, Pelletier G, Olesen RL (2015) Maintaining time alignment with multiple uplink carriers. US Patent 8,934,459
9.
Zurück zum Zitat Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, et al. (2013) Timestream: Reliable stream computation in the cloud. In: Proceedings of the 8th ACM European conference on computer systems. ACM, pp 1–14 Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, et al. (2013) Timestream: Reliable stream computation in the cloud. In: Proceedings of the 8th ACM European conference on computer systems. ACM, pp 1–14
10.
Zurück zum Zitat Cugola G, Margara A (2012) Processing flows of information: from data stream to complex event processing. ACM Comput Surv (CSUR) 44(3):15 Cugola G, Margara A (2012) Processing flows of information: from data stream to complex event processing. ACM Comput Surv (CSUR) 44(3):15
11.
Zurück zum Zitat Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448 Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448
12.
Zurück zum Zitat Fürnkranz J (1998) Integrative windowing. J Artif Intell Res 8:129–164MATH Fürnkranz J (1998) Integrative windowing. J Artif Intell Res 8:129–164MATH
13.
Zurück zum Zitat Sarma AD, Dong XL, Halevy AY (2011) Uncertainty in data integration and dataspace support platforms. In: Schema matching and mapping. Springer, New York, pp 75–108 Sarma AD, Dong XL, Halevy AY (2011) Uncertainty in data integration and dataspace support platforms. In: Schema matching and mapping. Springer, New York, pp 75–108
14.
Zurück zum Zitat Bellahsène Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, New YorkMATH Bellahsène Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, New YorkMATH
15.
Zurück zum Zitat Doan QT, Kayes A, Rahayu W, Nguyen K (2020) Integration of iot streaming data with efficient indexing and storage optimization. IEEE Access 8:47456–47467 Doan QT, Kayes A, Rahayu W, Nguyen K (2020) Integration of iot streaming data with efficient indexing and storage optimization. IEEE Access 8:47456–47467
16.
Zurück zum Zitat Gudivada VN, Baeza-Yates RA, Raghavan VV (2015) Big data: promises and problems. IEEE Comput 48(3):20–23 Gudivada VN, Baeza-Yates RA, Raghavan VV (2015) Big data: promises and problems. IEEE Comput 48(3):20–23
17.
Zurück zum Zitat Storey VC, Song IY (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67 Storey VC, Song IY (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67
18.
Zurück zum Zitat Dong XL, Srivastava D (2013) Big data integration. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, pp 1245–1248 Dong XL, Srivastava D (2013) Big data integration. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, pp 1245–1248
19.
Zurück zum Zitat Sagi T, Gal A, Barkol O, Bergman R, Avram A (2017) Multi-source uncertain entity resolution: transforming holocaust victim reports into people. Inf Syst 65:124–136 Sagi T, Gal A, Barkol O, Bergman R, Avram A (2017) Multi-source uncertain entity resolution: transforming holocaust victim reports into people. Inf Syst 65:124–136
20.
Zurück zum Zitat Doan A, Domingos PM, Levy AY (2000) Learning source description for data integration. In: WebDB (informal proceedings), pp 81–86 Doan A, Domingos PM, Levy AY (2000) Learning source description for data integration. In: WebDB (informal proceedings), pp 81–86
21.
Zurück zum Zitat Calbimonte JP, Corcho O, Gray AJ (2010) Enabling ontology-based access to streaming data sources. In: International semantic Web conference. Springer, New York, pp 96–111 Calbimonte JP, Corcho O, Gray AJ (2010) Enabling ontology-based access to streaming data sources. In: International semantic Web conference. Springer, New York, pp 96–111
22.
Zurück zum Zitat Daraio C, Lenzerini M, Leporelli C, Moed HF, Naggar P, Bonaccorsi A et al (2016a) Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics 106(2):857–871 Daraio C, Lenzerini M, Leporelli C, Moed HF, Naggar P, Bonaccorsi A et al (2016a) Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics 106(2):857–871
23.
Zurück zum Zitat Daraio C, Lenzerini M, Leporelli C, Naggar P, Bonaccorsi A, Bartolucci A (2016b) The advantages of an ontology-based data management approach: openness, interoperability and data quality. Scientometrics 108(1):441–455 Daraio C, Lenzerini M, Leporelli C, Naggar P, Bonaccorsi A, Bartolucci A (2016b) The advantages of an ontology-based data management approach: openness, interoperability and data quality. Scientometrics 108(1):441–455
24.
Zurück zum Zitat Gama J, Sebastião R, Rodrigues PP (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346MathSciNetMATH Gama J, Sebastião R, Rodrigues PP (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346MathSciNetMATH
25.
Zurück zum Zitat Pareek A, Khaladkar B, Sen R, Onat B, Nadimpalli V, Agarwal M, et al. (2017) Striim: a streaming analytics platform for real-time business decisions. In: Proceedings of the international workshop on real-time business intelligence and analytics. ACM, pp 1–8 Pareek A, Khaladkar B, Sen R, Onat B, Nadimpalli V, Agarwal M, et al. (2017) Striim: a streaming analytics platform for real-time business decisions. In: Proceedings of the international workshop on real-time business intelligence and analytics. ACM, pp 1–8
26.
Zurück zum Zitat Pareek A, Khaladkar B, Sen R, Onat B, Nadimpalli V, Lakshminarayanan M (2018) Real-time etl in striim. In: Proceedings of the international workshop on real-time business intelligence and analytics. ACM, p 3 Pareek A, Khaladkar B, Sen R, Onat B, Nadimpalli V, Lakshminarayanan M (2018) Real-time etl in striim. In: Proceedings of the international workshop on real-time business intelligence and analytics. ACM, p 3
27.
Zurück zum Zitat Ahad MA, Biswas R (2018) Dynamic merging based small file storage (dm-sfs) architecture for efficiently storing small size files in hadoop. Procedia Comput Sci 132:1626–1635 Ahad MA, Biswas R (2018) Dynamic merging based small file storage (dm-sfs) architecture for efficiently storing small size files in hadoop. Procedia Comput Sci 132:1626–1635
28.
Zurück zum Zitat Kayes A, Han J, Colman A (2012) Icaf: A context-aware framework for access control. In: Australasian conference on information security and privacy. Springer, New York, pp 442–449 Kayes A, Han J, Colman A (2012) Icaf: A context-aware framework for access control. In: Australasian conference on information security and privacy. Springer, New York, pp 442–449
29.
Zurück zum Zitat Kayes A, Han J, Colman A, Islam MS (2014) Relboss: a relationship-aware access control framework for software services. In: OTM confederated international conferences ”On the Move to Meaningful Internet Systems”. Springer, New York, pp 258–276 Kayes A, Han J, Colman A, Islam MS (2014) Relboss: a relationship-aware access control framework for software services. In: OTM confederated international conferences ”On the Move to Meaningful Internet Systems”. Springer, New York, pp 258–276
30.
Zurück zum Zitat Kayes A, Rahayu W, Dillon T, Chang E, Han J (2017) Context-aware access control with imprecise context characterization through a combined fuzzy logic and ontology-based approach. In: OTM confederated international conferences ”On the Move to Meaningful Internet Systems”. Springer, New York, pp 132–153 Kayes A, Rahayu W, Dillon T, Chang E, Han J (2017) Context-aware access control with imprecise context characterization through a combined fuzzy logic and ontology-based approach. In: OTM confederated international conferences ”On the Move to Meaningful Internet Systems”. Springer, New York, pp 132–153
32.
Zurück zum Zitat Kayes A, Rahayu W, Dillon T, Chang E (2018b) Accessing data from multiple sources through context-aware access control. In: TrustCom. IEEE, pp 551–559 Kayes A, Rahayu W, Dillon T, Chang E (2018b) Accessing data from multiple sources through context-aware access control. In: TrustCom. IEEE, pp 551–559
33.
Zurück zum Zitat Kayes A, Rahayu W, Dillon T, Chang E, Han J (2019) Context-aware access control with imprecise context characterization for cloud-based data resources. Future Gener Comput Syst 93:237–255 Kayes A, Rahayu W, Dillon T, Chang E, Han J (2019) Context-aware access control with imprecise context characterization for cloud-based data resources. Future Gener Comput Syst 93:237–255
34.
Zurück zum Zitat Savaglio C, Gerace P, Di Fatta G, Fortino G (2019) Data mining at the iot edge. In: 2019 28th international conference on computer communication and networks (ICCCN). IEEE, pp 1–6 Savaglio C, Gerace P, Di Fatta G, Fortino G (2019) Data mining at the iot edge. In: 2019 28th international conference on computer communication and networks (ICCCN). IEEE, pp 1–6
35.
Zurück zum Zitat Belli L, Cirani S, Davoli L, Ferrari G, Melegari L, Montón M et al (2015) A scalable big stream cloud architecture for the internet of things. Int J Syst Serv-Orient Eng (IJSSOE) 5(4):26–53 Belli L, Cirani S, Davoli L, Ferrari G, Melegari L, Montón M et al (2015) A scalable big stream cloud architecture for the internet of things. Int J Syst Serv-Orient Eng (IJSSOE) 5(4):26–53
36.
Zurück zum Zitat Hassan MM, Gumaei A, Alsanad A, Alrubaian M, Fortino G (2020) A hybrid deep learning model for efficient intrusion detection in big data environment. Inf Sci 513:386–396 Hassan MM, Gumaei A, Alsanad A, Alrubaian M, Fortino G (2020) A hybrid deep learning model for efficient intrusion detection in big data environment. Inf Sci 513:386–396
37.
Zurück zum Zitat Li Q, Moon B, Lopez I (2004) Skyline index for time series data. IEEE Trans Knowl Data Eng 16(6):669–684 Li Q, Moon B, Lopez I (2004) Skyline index for time series data. IEEE Trans Knowl Data Eng 16(6):669–684
38.
Zurück zum Zitat Ma Y, Rao J, Hu W, Meng X, Han X, Zhang Y et al (2012) An efficient index for massive iot data in cloud environment. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 2129–2133 Ma Y, Rao J, Hu W, Meng X, Han X, Zhang Y et al (2012) An efficient index for massive iot data in cloud environment. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 2129–2133
Metadaten
Titel
IoT streaming data integration from multiple sources
verfasst von
Doan Quang Tu
A. S. M. Kayes
Wenny Rahayu
Kinh Nguyen
Publikationsdatum
08.07.2020
Verlag
Springer Vienna
Erschienen in
Computing / Ausgabe 10/2020
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-020-00830-9

Weitere Artikel der Ausgabe 10/2020

Computing 10/2020 Zur Ausgabe