Skip to main content
Log in

Approaches for scaling DBSCAN algorithm to large spatial databases

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen M S, Han J H, Yu P S. Data mining: An overview from a database perspective.IEEE Trans. KDE, 1996, 8(6): 866–883.

    Google Scholar 

  2. Ng R T, Han J. Efficient and effective clustering methods for spatial data mining. InProceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 144–155.

  3. Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. InProceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada, 1996, pp.103–114.

  4. Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. InProceedings of 2nd International Conference on Knowledge Discovering in Databases and Data Mining (KDD-96), Portland, Oregon, August 1996.

  5. Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. InProceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, ACM Press, 1998, pp.428–439.

    Google Scholar 

  6. Zhang W, Yang J, Muntz R. STING: A statistical information grid approach to spatial data mining. InProceedings of the 23rd VLDB Conference, Athens, Greece, 1997, pp.186–195.

  7. Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. InProceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, ACM Press, 1998, pp.73–84.

    Google Scholar 

  8. Sheikholeslami G, Chatterjee S, Zhang A. Wave Cluster: A multi-resolution clustering approach for very large spatial databases. InProceedings of the 24th VLDB Conference, New York City, August 1998, pp.428–439.

  9. Bechmann N, Kriegel H P, Schneider R, Seeger B. The R*-tree: An efficient and robust access method for points and rectangles. InProc. ACM SIGMOD Int. Conf. Management of Data, Altantic City, NJ, 1990, pp.322–331.

  10. Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.

  11. Ester M, Kriegel H P, Xu X. Knowledge discovery in large spatial database: Focusing techniques for efficient class identification. InProceedings of 4th International Symposium on Large Spatial Databases, Portland, ME, 1995, InLecture Notes in Computer Science, Vol.951, Springer, 1995, pp.67–82.

  12. Vitter J. Random sampling with reservoir.ACM Transactions on Mathematical Software, 1985, 11(1): 37–57.

    Article  MATH  MathSciNet  Google Scholar 

  13. Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press, 1995.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhou Aoying.

Additional information

This work was supported by the National Natural Science Foundation of China (No.69743001) and the National Doctoral Subject Foundation of China.

For the biographies ofZHOU Aoying andZHOU Shuigeng, please refer to p.248 of No.3, Vol. 15 of this journal.

CAO Jing is a graduate student at Department of Computer Science, Fudan University. Her current research areas are databases and data mining.

FAN Ye is a graduate student at Department of Computer Science, Fudan University. His research interests are databases and data mining.

HU Yunfa graduated from Fudan University in 1964. He is currently a professor, Permanent Member of Council of Chinese Artificial Intelligence Society and Vice-director of Shanghai (International) Database Research Center. The areas of his research cover database, knowledge base and AI.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, A., Zhou, S., Cao, J. et al. Approaches for scaling DBSCAN algorithm to large spatial databases. J. Comput. Sci. & Technol. 15, 509–526 (2000). https://doi.org/10.1007/BF02948834

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02948834

Keywords

Navigation