skip to main content
10.1145/502512.502538acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Tri-plots: scalable tools for multidimensional data mining

Published:26 August 2001Publication History

ABSTRACT

We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What is the smallest/largest pair-wise distance across the two datasets? Which of the two clouds does a new point (feature vector) come from?We propose a new tool, the tri-plot, and its generalization, the pq-plot, which help us answer the above questions. We provide a set of rules on how to interpret a tri-plot, and we apply these rules on synthetic and real datasets. We also show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail.

References

  1. 1.R. Agrawal, J. Gherke, D. Gunopoulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications, 1998.Google ScholarGoogle Scholar
  2. 2.D. Barbare and P. Chen. Using the fractal dimension to cluster datasets. In Proc. of the 6th International Conference on Knowledge Discovery and Data Mining (KDD-200), pages 260-264, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.A. Belussi and C. Faloutsos. Estimating the selectivity of spatial queries using the 'correlation' fractal dimension. In Proc. of VLDB - Very Large Data Bases, pages 299-310, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.S. Berchtold, C. B6hm, D. A. Keim, and H.-P. Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In Proc. of the I6th A CM SIGA CT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 78-86, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.S. Berchtold, C. BShm, and H.-P. Kriegel. The pyramid-tree: Breaking the curse of dimensionality. In Proc. A CM SIGMOD Conf. on Management of Data, pages 142-153, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.S. Chaudhuri. Data mining and database systems: Where is the intersection? Data Engineering Bulletin, 21(1):4-8, 1998.Google ScholarGoogle Scholar
  7. 7.M.-S. Chen, J. Han, and P. S. Yu. Data mining - an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866-883, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.M. Ester, A. Frommelt, H.-P. Kriegel, and J. Sander. Algorithms for characterization and trend detection in spatial databases. In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 44-50, 1998.Google ScholarGoogle Scholar
  9. 9.M. Ester, H.-P. Kriegel, and J. Sander. Spatial data mining: A database approach. In LNCS 1262: Proc. of 5th Intl. Symposium on Spatial Databases (SSD'97), pages 47-66, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.C. Faloutsos, B. Seeger, C. Traina Jr., and A. Traina. Spatial join selectivity using power laws. In Proc. ACM SIGMOD 2000 Conf. on Management of Data, pages 177-188, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.U. M. Fayyad. Mining databases - towards algorithms for knowledge discovery. Data Engineering Bulletin, 21(1):39-48, 1998.Google ScholarGoogle Scholar
  12. 12.U. M. Fayyad, C. Reina, and P. S. Bradley. Initialization of iterative refinement clustering algorithms. In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 194-198, 1998.Google ScholarGoogle Scholar
  13. 13.V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining very large databases. IEEE Computer, 32(8):38-45, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-based multidimensional data mining. IEEE Computer, 32(8):46-50, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.C. Traina Jr., A. Traina, L. Wu, and C. Faloutsos. Fast feature selection using the fractal dimension. In XV Brazilian Symposium on Databases (SBBD), 2000.Google ScholarGoogle Scholar
  16. 16.R. J. Bayardo Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In Proc. of IEEE Intl. Conference on Data Engineering, pages 188-197, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.D. A. Keim and H.-P. Kriegel. Possibilities and limits in visualizing large amounts of multidimensional data. In Perceptual Issues in Visualization. Springer, 1994.Google ScholarGoogle Scholar
  18. 18.E. M. Knorr and R. T. Ng. Finding aggregate proximity relationships and commonalities in spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 8(6):884-897, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.R. T. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. In Proc. of VLDB - Very Large Data Bases, pages 144-155, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.Bureau of Census. Tiger/line preeensus files: 1990 technical documentation. Bureau of the Census. Washington, DC, 1989.Google ScholarGoogle Scholar
  21. 21.B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proc. of IEEE Intl. Conference on Data Engineering, pages 589-598, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.M. Schroeder. Fractals, Chaos, Power Laws. W.H. Freeman and Company, New York, 1991.Google ScholarGoogle Scholar
  23. 23.H. G. Schuster. Deterministic Chaos. VCH Publisher, Weinheim, Basel, Cambridge, New York, 1988.Google ScholarGoogle Scholar
  24. 24.G. Sheikholeslami, S. Chatterjee, and A. Zhang. Wavecluster: A multi-resolution clustering approach for very large spatial databases. In Proe. of VLDB - Very Large Data Bases, pages 428-439, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.M. L. Tian Zhang, R. Ramakrishnan. Birch: An efficient data clustering method for very large databases. In Proe. of A CM SIGMOD Conf. on Management of Data, pages 103-114, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of VLDB - Very Large Data Bases, pages 194-205, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tri-plots: scalable tools for multidimensional data mining

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
              August 2001
              493 pages
              ISBN:158113391X
              DOI:10.1145/502512

              Copyright © 2001 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 26 August 2001

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%

              Upcoming Conference

              KDD '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader