Skip to main content
Top

2017 | OriginalPaper | Chapter

Tweet Cluster Analyzer: Partition and Join-based Micro-clustering for Twitter Data Stream

Authors : M. Arun Manicka Raja, S. Swamynathan

Published in: Computational Intelligence in Data Mining

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data stream mining is the process of extracting knowledge from continuously generated data. Since data stream processing is not a trivial task, the streams have to be analyzed with proper stream mining techniques. In many large volume of data stream processing, stream clustering helps to find the valuable hidden information. Many works have concentrated on clustering the data streams using various methods, but mostly those approaches lack in some core tasks needed to improve the cluster accuracy and quick processing of data streams. To tackle the problem of improving cluster quality and reducing the time for data stream processing time in cluster generation, the partition-based DBStream clustering method is proposed. The result has been compared with various data stream clustering methods, and it is evident from the experiments that the purity of clusters improves 5% and the time taken is reduced by 10% than the average time taken by other methods for clustering the data streams.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference X. Gao, E. Ferrara and J. Qiu, “Parallel Clustering of High-Dimensional Social Media Data Streams,” 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, 2015, pp. 323–332. X. Gao, E. Ferrara and J. Qiu, “Parallel Clustering of High-Dimensional Social Media Data Streams,” 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, 2015, pp. 323–332.
2.
go back to reference A. Kaneriya and M. Shukla, “A novel approach for clustering data streams using granularity technique,” International Conference on Advances in Computer Engineering and Applications (ICACEA), Ghaziabad, 2015, pp. 586–590. A. Kaneriya and M. Shukla, “A novel approach for clustering data streams using granularity technique,” International Conference on Advances in Computer Engineering and Applications (ICACEA), Ghaziabad, 2015, pp. 586–590.
3.
go back to reference G. Lin and L. Chen, “A Grid and Fractal Dimension-Based Data Stream Clustering Algorithm,” International Symposium on Information Science and Engineering, Shanghai, 2008, pp. 66–70. G. Lin and L. Chen, “A Grid and Fractal Dimension-Based Data Stream Clustering Algorithm,” International Symposium on Information Science and Engineering, Shanghai, 2008, pp. 66–70.
4.
go back to reference A. Amini, H. Saboohi and T. Y. Wah, “A Multi Density-Based Clustering Algorithm for Data Stream with Noise,” IEEE 13th International Conference on Data Mining Workshops, Dallas, 2013, pp. 1105–1112. A. Amini, H. Saboohi and T. Y. Wah, “A Multi Density-Based Clustering Algorithm for Data Stream with Noise,” IEEE 13th International Conference on Data Mining Workshops, Dallas, 2013, pp. 1105–1112.
5.
go back to reference M. Kumar and A. Sharma, “Mining of data stream using DDenStream clustering algorithm,” IEEE International Conference in MOOC Innovation and Technology in Education (MITE), Jaipur, 2013, pp. 315–320. M. Kumar and A. Sharma, “Mining of data stream using DDenStream clustering algorithm,” IEEE International Conference in MOOC Innovation and Technology in Education (MITE), Jaipur, 2013, pp. 315–320.
6.
go back to reference W. Liu and J. OuYang, “Clustering Algorithm for High Dimensional Data Stream over Sliding Windows,” IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, Changsha, 2011, pp. 1537–1542. W. Liu and J. OuYang, “Clustering Algorithm for High Dimensional Data Stream over Sliding Windows,” IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, Changsha, 2011, pp. 1537–1542.
7.
go back to reference Qian Zhou, “A recent-biased clustering algorithm of data stream,” Second International Conference on Mechanic Automation and Control Engineering (MACE), Hohhot, 2011, pp. 3803–3808. Qian Zhou, “A recent-biased clustering algorithm of data stream,” Second International Conference on Mechanic Automation and Control Engineering (MACE), Hohhot, 2011, pp. 3803–3808.
8.
go back to reference A. Zhou, F. Cao, Y. Yan, C. Sha and X. He, “Distributed Data Stream Clustering: A Fast EM-based Approach,” IEEE 23rd International Conference on Data Engineering, Istanbul, 2007, pp. 736–745. A. Zhou, F. Cao, Y. Yan, C. Sha and X. He, “Distributed Data Stream Clustering: A Fast EM-based Approach,” IEEE 23rd International Conference on Data Engineering, Istanbul, 2007, pp. 736–745.
9.
go back to reference R. Fathzadeh and V. Mokhtari, “An ensemble learning approach for data stream clustering,” 21st Iranian Conference on Electrical Engineering (ICEE), Mashhad, 2013, pp. 1–6. R. Fathzadeh and V. Mokhtari, “An ensemble learning approach for data stream clustering,” 21st Iranian Conference on Electrical Engineering (ICEE), Mashhad, 2013, pp. 1–6.
10.
go back to reference M. m. Gao, J. z. Liu and X. x. Gao, “Application of Compound Gaussian Mixture Model clustering in the data stream,” International Conference on Computer Application and System Modeling (ICCASM 2010), Taiyuan, 2010, pp. V7-172-V7-177. M. m. Gao, J. z. Liu and X. x. Gao, “Application of Compound Gaussian Mixture Model clustering in the data stream,” International Conference on Computer Application and System Modeling (ICCASM 2010), Taiyuan, 2010, pp. V7-172-V7-177.
11.
go back to reference X. Zhang, C. Furtlehner, C. Germain-Renaud and M. Sebag, “Data Stream Clustering With Affinity Propagation,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 7, 2014, pp. 1644–1656. X. Zhang, C. Furtlehner, C. Germain-Renaud and M. Sebag, “Data Stream Clustering With Affinity Propagation,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 7, 2014, pp. 1644–1656.
12.
go back to reference H. Zhu, Y. Wang and Z. Yu, “Clustering of Evolving Data Stream with Multiple Adaptive Sliding Window,” Data Storage and Data Engineering (DSDE), International Conference on, Bangalore, 2010, pp. 95–100. H. Zhu, Y. Wang and Z. Yu, “Clustering of Evolving Data Stream with Multiple Adaptive Sliding Window,” Data Storage and Data Engineering (DSDE), International Conference on, Bangalore, 2010, pp. 95–100.
13.
go back to reference C. D. Wang, J. H. Lai, D. Huang and W. S. Zheng, “SVStream: A Support Vector-Based Algorithm for Clustering Data Streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, 2013, pp. 1410–1424. C. D. Wang, J. H. Lai, D. Huang and W. S. Zheng, “SVStream: A Support Vector-Based Algorithm for Clustering Data Streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, 2013, pp. 1410–1424.
14.
go back to reference Huanliang Sun, Ge Yu, Yubin Bao, Faxin Zhao and Daling Wang, “CDS-Tree: an effective index for clustering arbitrary shapes in data streams,” 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05), 2005, pp. 81–88. Huanliang Sun, Ge Yu, Yubin Bao, Faxin Zhao and Daling Wang, “CDS-Tree: an effective index for clustering arbitrary shapes in data streams,” 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05), 2005, pp. 81–88.
15.
go back to reference Kehua Yang, HeqingGao, Lin Chen and Qiong Yuan, “Self-adaptive clustering data stream algorithm based on SSMC-tree,” 4th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, 2013, pp. 342–345. Kehua Yang, HeqingGao, Lin Chen and Qiong Yuan, “Self-adaptive clustering data stream algorithm based on SSMC-tree,” 4th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, 2013, pp. 342–345.
16.
go back to reference Charu C. Aggarwal, “A Framework for Clustering Evolving Data Streams” Proceedings of the 29th VLDB Conference, Berlin, Germany, Vol. 29, 2003, pp. 81–92. Charu C. Aggarwal, “A Framework for Clustering Evolving Data Streams” Proceedings of the 29th VLDB Conference, Berlin, Germany, Vol. 29, 2003, pp. 81–92.
Metadata
Title
Tweet Cluster Analyzer: Partition and Join-based Micro-clustering for Twitter Data Stream
Authors
M. Arun Manicka Raja
S. Swamynathan
Copyright Year
2017
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-3874-7_64

Premium Partner