Approaches for scaling DBSCAN algorithm to large spatial databases

Zhou, Aoying; Zhou, Shuigeng; Cao, Jing; Fan, Ye; Hu, Yunfa

doi:10.1007/BF02948834

Approaches for scaling DBSCAN algorithm to large spatial databases

Published: November 2000

Volume 15, pages 509–526, (2000)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Zhou Aoying¹,
Zhou Shuigeng¹,
Cao Jing¹,
Fan Ye¹ &
…
Hu Yunfa¹

701 Accesses
66 Citations
Explore all metrics

Abstract

The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

Density-Based Clustering Based on Hierarchical Density Estimates

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

Salvador García, Sergio Ramírez-Gallego, … Francisco Herrera

References

Chen M S, Han J H, Yu P S. Data mining: An overview from a database perspective.IEEE Trans. KDE, 1996, 8(6): 866–883.
Google Scholar
Ng R T, Han J. Efficient and effective clustering methods for spatial data mining. InProceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 144–155.
Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. InProceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, 1996, pp.103–114.
Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. InProceedings of 2nd International Conference on Knowledge Discovering in Databases and Data Mining (KDD-96), Portland, Oregon, August 1996.
Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. InProceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, ACM Press, 1998, pp.428–439.
Google Scholar
Zhang W, Yang J, Muntz R. STING: A statistical information grid approach to spatial data mining. InProceedings of the 23rd VLDB Conference, Athens, Greece, 1997, pp.186–195.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. InProceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, ACM Press, 1998, pp.73–84.
Google Scholar
Sheikholeslami G, Chatterjee S, Zhang A. Wave Cluster: A multi-resolution clustering approach for very large spatial databases. InProceedings of the 24th VLDB Conference, New York City, August 1998, pp.428–439.
Bechmann N, Kriegel H P, Schneider R, Seeger B. The R^*-tree: An efficient and robust access method for points and rectangles. InProc. ACM SIGMOD Int. Conf. Management of Data, Altantic City, NJ, 1990, pp.322–331.
Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
Ester M, Kriegel H P, Xu X. Knowledge discovery in large spatial database: Focusing techniques for efficient class identification. InProceedings of 4th International Symposium on Large Spatial Databases, Portland, ME, 1995, InLecture Notes in Computer Science, Vol.951, Springer, 1995, pp.67–82.
Vitter J. Random sampling with reservoir.ACM Transactions on Mathematical Software, 1985, 11(1): 37–57.
Article MATH MathSciNet Google Scholar
Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press, 1995.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Fudan University, 200433, Shanghai, P.R. China
Zhou Aoying, Zhou Shuigeng, Cao Jing, Fan Ye & Hu Yunfa

Authors

Zhou Aoying
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Shuigeng
View author publications
You can also search for this author in PubMed Google Scholar
Cao Jing
View author publications
You can also search for this author in PubMed Google Scholar
Fan Ye
View author publications
You can also search for this author in PubMed Google Scholar
Hu Yunfa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhou Aoying.

Additional information

This work was supported by the National Natural Science Foundation of China (No.69743001) and the National Doctoral Subject Foundation of China.

For the biographies ofZHOU Aoying andZHOU Shuigeng, please refer to p.248 of No.3, Vol. 15 of this journal.

CAO Jing is a graduate student at Department of Computer Science, Fudan University. Her current research areas are databases and data mining.

FAN Ye is a graduate student at Department of Computer Science, Fudan University. His research interests are databases and data mining.

HU Yunfa graduated from Fudan University in 1964. He is currently a professor, Permanent Member of Council of Chinese Artificial Intelligence Society and Vice-director of Shanghai (International) Database Research Center. The areas of his research cover database, knowledge base and AI.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, A., Zhou, S., Cao, J. et al. Approaches for scaling DBSCAN algorithm to large spatial databases. J. Comput. Sci. & Technol. 15, 509–526 (2000). https://doi.org/10.1007/BF02948834

Download citation

Received: 19 March 1999
Revised: 05 March 2000
Issue Date: November 2000
DOI: https://doi.org/10.1007/BF02948834

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approaches for scaling DBSCAN algorithm to large spatial databases

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Density-Based Clustering Based on Hierarchical Density Estimates

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approaches for scaling DBSCAN algorithm to large spatial databases

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Density-Based Clustering Based on Hierarchical Density Estimates

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation