skip to main content
10.1145/304182.304187acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

OPTICS: ordering points to identify the clustering structure

Authors Info & Claims
Published:01 June 1999Publication History

ABSTRACT

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.

References

  1. AGG+ 98.Agrawal R., Gehrke J., Gunopulos D., Raghavan P.: "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD'98 Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94- 105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AKK 96.Ankerst M., Keim D. A., Kriegel H.-P.: "'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets", Proc. Visualization'96, Hot Topic Session, San Francisco, CA, 1996.Google ScholarGoogle Scholar
  3. BKK 96.Berchthold S., Keim D., Kriegel H.-P.: "The X-Tree: An Index Structure for High-Dimensional Data", 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BKSS 90.Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: "The R*-tree: An ti',fficient and Robust Access Method for Points and Rectangles", Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, ACM Press, New York, 1990, pp. 322-331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CPZ 97.Ciaccia P., Patella M., Zezula P.: "M-tree: An Efficient Access Method for Similarity Search in Metric Spaces ", Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, 1997, pp. 426-435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. EKSX 96.Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.Google ScholarGoogle Scholar
  7. EKS+ 98.Ester M., Kriegel H.-P., Sander J., Wimmer M., Xu X.: "Incremental Clustering for Mining in a Data Warehousing Environment", Proc. 24th Int. Conf. on Very I.,arge Data Bases, New York, NY, 1998, pp. 323-333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. EKX 95.Ester M., Kriegel H.-P., Xu X.: "Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification", Proc. 4th Int. Symp. on Laxge Spatial Databases, Portland, ME, 1995, in: Lecture Notes in Computer Science, Vol. 951, Springer, 1995, pp. 6'7-82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. GM 85.Grossman A., Morlet J.: "Decomposition oj'functions into wavelets of constant shapes and related tr~msforms". Mathematics and Physics: Lectures on Recent Restdts, World Scientific, Singapore, 1985.Google ScholarGoogle Scholar
  10. GRS 98.Guha S., Rastogi R., Shim K.: "CURE: A~ Efficient Clustering Algorithms for Large Databases", P-oc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 73-84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. HK 98.Hinneburg A., Keim D.: "An Efficient Approa~:h to Clustering in Large Multimedia Databases with Noise"~, Proc. 4th Int. Conf. on Knowledge Discovery & Data Milling, New York City, NY, 1998.Google ScholarGoogle Scholar
  12. HT 93.Hattori K., Torii Y.: "Effective algorithms for Jhe nearest neighbor method in the clustering problem", Patt~.,rn Recognition, 1993, Vol. 26, No. 5, pp. 741-746.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hua 97.Huang Z.: "A Fast Clustering Algorithm to C,'uster Very Large Categorical Data Sets in Data Mining", 1)roc. SIG- OD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS, 1997.Google ScholarGoogle Scholar
  14. JD 88.Jain A. K., Dubes R. C.: "Algorithms for Clustering Data," Prentice-Hall, Inc., 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kei 96a.Keim D. A.: "Pixel-oriented Database Visualizations", in: SIGMOD RECORD, Special Issue on Inform~tion Visualization, Dezember 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kei 96b.Keim D. A.: "Databases and Visualization", t'roc. Tutorial ACM SIGMOD Int. Conf. on Managemen: of Data, Montreal, Canada, 1996, p. 543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. KN 96.Knorr E. M., Ng R.T.: "Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining," IEEE Trans. on Knowledge and Data Engineeri}lg, Vol. 8, No. 6, December 1996, pp. 884-897. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. KR 90.Kaufman L., Rousseeuw E J.: "Finding GrouFs in Data: An Introduction to Cluster Analysis", John Wiley & Sons, 1990.Google ScholarGoogle Scholar
  19. Mac 67.MacQueen, J.: "Some Methods for Classification and Analysis of Multivariate Observations", 5th Berkeley Synap. Math. Statist. Prob., Vol. 1, pp. 281-297.Google ScholarGoogle Scholar
  20. NH 94.Ng R. T., Han J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. PTVF 92.Press W. H.,Teukolsky S. A., Vetterling W. T., Flannery B. E: "Numerical Recipes in C", 2nd ed., Cambridl,ye University Press, 1992.Google ScholarGoogle Scholar
  22. Ric 83.Richards A. J.: "Remote Sensing Digital hnag~: Analysis. An Introduction", 1983, Berlin, Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sch 96.Schikuta E.: "'Grid clustering: An efficient hierarchical clustering method for very large data sets". Proc. 13th Int. Conf. on Pattern Recognition, Vol 2, 1996, pp. 101-105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. SE 97.Schikuta E., Erhart M.: "The bang-clustering system: Grid-based data analysis". Proc. Sec. Int. Symp. IDA-97, Vol. 1280 LNCS, London, UK, Springer-Verlag, 1 c,'97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. SCZ 98.Sheikholeslami G., Chatterjee S., Zhang A.: "'lCaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases", Proc. 24th Int. Conf. on Very I,arge Data Bases, New York, NY, 1998, pp. 428 - 439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sib 73.Sibson R.: "SLINK: an optimally efficient alggrithm for the single-link cluster method".The Comp. Journal, Vol. }'~ 6, No. 1, 1973, pp. 30-34.Google ScholarGoogle Scholar
  27. ZRL 96.Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Dtttabase,s". Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp. 103- i 14. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. OPTICS: ordering points to identify the clustering structure

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
          June 1999
          604 pages
          ISBN:1581130848
          DOI:10.1145/304182

          Copyright © 1999 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 1999

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader