Skip to main content
Top

2019 | OriginalPaper | Chapter

A Dynamic Model + BFR Algorithm for Streaming Data Sorting

Authors : Yongwei Tan, Ling Huang, Chang-Dong Wang

Published in: Intelligence Science and Big Data Engineering. Big Data and Machine Learning

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Streaming data is widely generated in our lives. This has promoted a lot of research on streaming data mining, such as streaming data clustering and filtering. In our work, we present a problem about data stream processing, namely, streaming data sorting. There are some important characteristics of streaming data. Firstly, streaming data comes in the form of streams. It is usually assumed that streaming data is infinite, so it cannot be stored completely in memory. Secondly, we must process the streaming data in real time, otherwise we may lose the opportunity to deal with it forever. Based on these characteristics, we propose a dynamic algorithm that can make full use of memory and minimize error to solve the problem of streaming data sorting, which is further combined with the BFR algorithm to sort a particular type of streaming data. Some experiments are conducted to confirm the effectiveness of the proposed algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Wang, C.D., Lai, J.H., Huang, D.: Incremental support vector clustering. In: ICDM Workshop, pp. 839–846 (2011) Wang, C.D., Lai, J.H., Huang, D.: Incremental support vector clustering. In: ICDM Workshop, pp. 839–846 (2011)
2.
go back to reference Huang, D., Lai, J.H., Wang, C.D.: Incremental support vector clustering with outlier detection. In: ICPR, pp. 2339–2342 (2012) Huang, D., Lai, J.H., Wang, C.D.: Incremental support vector clustering with outlier detection. In: ICPR, pp. 2339–2342 (2012)
3.
go back to reference Wang, C.D., Lai, J.H., Yu, P.S.: Dynamic community detection in weighted graph streams. In: SDM, pp. 151–161 (2013) Wang, C.D., Lai, J.H., Yu, P.S.: Dynamic community detection in weighted graph streams. In: SDM, pp. 151–161 (2013)
4.
go back to reference Ding, Y., Huang, L., Wang, C.D., Huang, D.: Community detection in graph streams by pruning zombie nodes. In: PAKDD, pp. 574–585 (2017) Ding, Y., Huang, L., Wang, C.D., Huang, D.: Community detection in graph streams by pruning zombie nodes. In: PAKDD, pp. 574–585 (2017)
5.
go back to reference Liang, W.B., Wang, C.D., Lai, J.H.: Weighted numerical and categorical attribute clustering in data streams. In: IJCNN, pp. 3066–3072 (2017) Liang, W.B., Wang, C.D., Lai, J.H.: Weighted numerical and categorical attribute clustering in data streams. In: IJCNN, pp. 3066–3072 (2017)
6.
go back to reference Chen, J., Lin, X., Xuan, Q., Xiang, Y.: FGCH: a fast and grid based clustering algorithm for hybrid data stream. Appl. Intell. 49(4), 1228–1244 (2019)CrossRef Chen, J., Lin, X., Xuan, Q., Xiang, Y.: FGCH: a fast and grid based clustering algorithm for hybrid data stream. Appl. Intell. 49(4), 1228–1244 (2019)CrossRef
7.
go back to reference Wang, C.D., Lai, J.H., Huang, D., Zheng, W.S.: SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans. Knowl. Data Eng. 25(6), 1410–1424 (2013)CrossRef Wang, C.D., Lai, J.H., Huang, D., Zheng, W.S.: SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans. Knowl. Data Eng. 25(6), 1410–1424 (2013)CrossRef
8.
go back to reference Laga, A., Boukhobza, J., Singhoff, F., Koskas, M.: MONTRES: merge on-the-run external sorting algorithm for large data volumes on SSD based storage systems. IEEE Trans. Comput. 66(10), 1689–1702 (2017)MathSciNetCrossRef Laga, A., Boukhobza, J., Singhoff, F., Koskas, M.: MONTRES: merge on-the-run external sorting algorithm for large data volumes on SSD based storage systems. IEEE Trans. Comput. 66(10), 1689–1702 (2017)MathSciNetCrossRef
9.
go back to reference Shabaz, M., Kumar, A.: SA sorting: a novel sorting technique for large-scale data. J. Comput. Netw. Commun. 2019, 3027578:1–3027578:7 (2019) Shabaz, M., Kumar, A.: SA sorting: a novel sorting technique for large-scale data. J. Comput. Netw. Commun. 2019, 3027578:1–3027578:7 (2019)
10.
go back to reference Sukhwani, B., et al.: Large payload streaming database sort and projection on FPGAs. In: SBAC-PAD, pp. 25–32 (2013) Sukhwani, B., et al.: Large payload streaming database sort and projection on FPGAs. In: SBAC-PAD, pp. 25–32 (2013)
11.
go back to reference Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB, pp. 81–92 (2003)CrossRef Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB, pp. 81–92 (2003)CrossRef
12.
go back to reference Al-Shammari, A., Zhou, R., Naseriparsa, M., Liu, C.: An effective density-based clustering and dynamic maintenance framework for evolving medical data streams. Int. J. Med. Inform. 126, 176–186 (2019)CrossRef Al-Shammari, A., Zhou, R., Naseriparsa, M., Liu, C.: An effective density-based clustering and dynamic maintenance framework for evolving medical data streams. Int. J. Med. Inform. 126, 176–186 (2019)CrossRef
13.
go back to reference Vial, J.J.B., Devanny, W.E., Eppstein, D., Goodrich, M.T., Johnson, T.: Quadratic time algorithms appear to be optimal for sorting evolving data. In: Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments, ALENEX 2018, New Orleans, LA, USA, 7–8 January 2018, pp. 87–96 (2018) Vial, J.J.B., Devanny, W.E., Eppstein, D., Goodrich, M.T., Johnson, T.: Quadratic time algorithms appear to be optimal for sorting evolving data. In: Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments, ALENEX 2018, New Orleans, LA, USA, 7–8 January 2018, pp. 87–96 (2018)
14.
go back to reference Domino, K., Gawron, P.: An algorithm for arbitrary-order cumulant tensor calculation in a sliding window of data streams. Appl. Math. Comput. Sci. 29(1), 195–206 (2019)MathSciNetMATH Domino, K., Gawron, P.: An algorithm for arbitrary-order cumulant tensor calculation in a sliding window of data streams. Appl. Math. Comput. Sci. 29(1), 195–206 (2019)MathSciNetMATH
15.
go back to reference Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 2nd edn. Cambridge University Press, Cambridge (2014)CrossRef Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 2nd edn. Cambridge University Press, Cambridge (2014)CrossRef
16.
go back to reference Fernandez-Basso, C., Francisco-Agra, A.J., Martín-Bautista, M.J., Ruiz, M.D.: Finding tendencies in streaming data using big data frequent itemset mining. Knowl.-Based Syst. 163, 666–674 (2019)CrossRef Fernandez-Basso, C., Francisco-Agra, A.J., Martín-Bautista, M.J., Ruiz, M.D.: Finding tendencies in streaming data using big data frequent itemset mining. Knowl.-Based Syst. 163, 666–674 (2019)CrossRef
17.
go back to reference Liang, C., Li, M., Liu, B.: Online computing quantile summaries over uncertain data streams. IEEE Access 7, 10916–10926 (2019)CrossRef Liang, C., Li, M., Liu, B.: Online computing quantile summaries over uncertain data streams. IEEE Access 7, 10916–10926 (2019)CrossRef
18.
go back to reference Vial, J.J.B., Devanny, W.E., Eppstein, D., Goodrich, M.T., Johnson, T.: Optimally sorting evolving data. In: 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, 9–13 July 2018, Prague, Czech Republic, pp. 81:1–81:13 (2018) Vial, J.J.B., Devanny, W.E., Eppstein, D., Goodrich, M.T., Johnson, T.: Optimally sorting evolving data. In: 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, 9–13 July 2018, Prague, Czech Republic, pp. 81:1–81:13 (2018)
19.
go back to reference Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press and McGraw-Hill Book Company, Cambridge and New York (2001)MATH Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press and McGraw-Hill Book Company, Cambridge and New York (2001)MATH
20.
go back to reference Goodrich, M.T., Tamassia, R.: Algorithm Design and Applications, 1st edn. Wiley, Hoboken (2014)MATH Goodrich, M.T., Tamassia, R.: Algorithm Design and Applications, 1st edn. Wiley, Hoboken (2014)MATH
21.
go back to reference Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York City, New York, USA, 27–31 August 1998, pp. 9–15 (1998) Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York City, New York, USA, 27–31 August 1998, pp. 9–15 (1998)
Metadata
Title
A Dynamic Model + BFR Algorithm for Streaming Data Sorting
Authors
Yongwei Tan
Ling Huang
Chang-Dong Wang
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-36204-1_34

Premium Partner