article

Requirements for clustering data streams

Author:
Daniel Barbará

George Mason University, Fairfax, VA

George Mason University, Fairfax, VA
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 3 Issue 2January 2002pp 23–27https://doi.org/10.1145/507515.507519

Published:01 January 2002Publication History

ACM SIGKDD Explorations Newsletter

Abstract

Scientific and industrial examples of data streams abound in astronomy, telecommunication operations, banking and stock-market applications, e-commerce and other fields. A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new data arrives. In this paper, we analyze the requirements needed for clustering data streams. We review some of the latest algorithms in the literature and assess if they meet these requirements.

References

Barbará D., and Chen, P. Using the Fractal Dimension to Cluster Datasets. Proceedings of the ACM-SIGKDD International Conference on Knowledge and Data Mining, Boston, August 2000. Google ScholarDigital Library
Barbará D., and Chen, P. Tracking Clusters in Evolving Data Sets. Proceedings of FLAIRS'2001, Special Track on Knowledge Discovery and Data Mining , Key West, May 2001. Google ScholarDigital Library
Chernoff, H. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. Annals of Mathematical Statistics, Vol. 23, pages 493-509, 1952.Google ScholarCross Ref
Fisher D. H. Iterative Optimization and Simplification of Hierarchical Clusterings. Journal of AI Research, Vol. 4, pages 147-180, 1996. Google ScholarDigital Library
Guha S., Mishra N., Motwani R., and O'Callaghan L. Clustering data streams. Proceedings of the Annual Symposium on Foundations of Computer Science November 2000. Google ScholarDigital Library
Gluck M. A., and Corter J. E. Information, uncertainty, and the utility of categories. Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine, CA, 1985.Google Scholar
Schroeder M. Fractal, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman and Company, 1991.Google Scholar
Sheikholeslami G., Chatterjee S., and Zhang A. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. Proceedings of the 24th Very Large Data Bases Conference, 1998. Google ScholarDigital Library
Traina, A., Traina, C., Papadimitriou S., and Faloutsos, C. Tri-Plots: Scalable Tools for Multidimensional Data Mining. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge and Data Mining, San Francisco, August 2001. Google ScholarDigital Library
O'Callaghan L., Mishra N., Meyerson A., Guha S., and Motwani R. High-Performance Clustering of Streams and Large Data Sets. International Conference on Data Engineering (ICDE) 2002 (to appear).Google Scholar
Watanabe, O. Simple Sampling Techniques for Discovery Science. IEICE Transactions on Inf. & Syst., Vol. E83-D, No. 1, January, 2000.Google Scholar
Zhang T., Ramakrishnan R., and Livny M. "BIRCH: A Efficient Data Clustering Method for Very Large Databases. Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, 1996. Google ScholarDigital Library

Index Terms

Requirements for clustering data streams
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Index terms have been assigned to the content through auto-classification.

Recommendations

A Framework for Clustering Massive-Domain Data Streams
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering

In this paper, we will examine the problem of clustering massive domain data streams. Massive-domain data streams are those in which the number of possible domain values for each attribute are very large and cannot be easily tracked for clustering ...
Read More
Adaptive non-linear clustering in data streams
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

Data stream clustering has emerged as a challenging and interesting problem over the past few years. Due to the evolving nature, and one-pass restriction imposed by the data stream model, traditional clustering algorithms are inapplicable for stream ...
Read More
Big Data Clustering using Data Streams Approach
BDAW '16: Proceedings of the International Conference on Big Data and Advanced Wireless Technologies

In this paper we propose to process big data using a data streams approach. The data set is divided into subsets, each subsets is considered as a time window from a data stream. Our approach uses a neighborhood-based clustering. Instead of processing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGKDD Explorations Newsletter Volume 3, Issue 2
January 2002
81 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/507515
Issue’s Table of Contents

Copyright © 2002 Author
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2002
Check for updates
Author Tags
Data streams
clustering
outliers
tracking changing models
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 103
  Total Citations
  View Citations
- 1,667
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Requirements for clustering data streams

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

A Framework for Clustering Massive-Domain Data Streams

Adaptive non-linear clustering in data streams

Big Data Clustering using Data Streams Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Requirements for clustering data streams

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

A Framework for Clustering Massive-Domain Data Streams

Adaptive non-linear clustering in data streams

Big Data Clustering using Data Streams Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media