Article

Tri-plots: scalable tools for multidimensional data mining

Authors:
Agma Traina

University of S. Paulo at S. Carlos, Brazil

University of S. Paulo at S. Carlos, Brazil
View Profile

,
Caetano Traina

University of S. Paulo at S. Carlos, Brazil

University of S. Paulo at S. Carlos, Brazil
View Profile

,
Spiros Papadimitriou

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Christos Faloutsos

Carnegie Mellon University

Carnegie Mellon University
View Profile

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2001Pages 184–193https://doi.org/10.1145/502512.502538

Published:26 August 2001Publication History

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 184–193

ABSTRACT

We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: Are the two clouds of points separable? What is the smallest/largest pair-wise distance across the two datasets? Which of the two clouds does a new point (feature vector) come from?We propose a new tool, the tri-plot, and its generalization, the pq-plot, which help us answer the above questions. We provide a set of rules on how to interpret a tri-plot, and we apply these rules on synthetic and real datasets. We also show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail.

References

1.R. Agrawal, J. Gherke, D. Gunopoulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications, 1998.Google Scholar
2.D. Barbare and P. Chen. Using the fractal dimension to cluster datasets. In Proc. of the 6th International Conference on Knowledge Discovery and Data Mining (KDD-200), pages 260-264, 2000. Google ScholarDigital Library
3.A. Belussi and C. Faloutsos. Estimating the selectivity of spatial queries using the 'correlation' fractal dimension. In Proc. of VLDB - Very Large Data Bases, pages 299-310, 1995. Google ScholarDigital Library
4.S. Berchtold, C. B6hm, D. A. Keim, and H.-P. Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In Proc. of the I6th A CM SIGA CT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 78-86, 1997. Google ScholarDigital Library
5.S. Berchtold, C. BShm, and H.-P. Kriegel. The pyramid-tree: Breaking the curse of dimensionality. In Proc. A CM SIGMOD Conf. on Management of Data, pages 142-153, 1998. Google ScholarDigital Library
6.S. Chaudhuri. Data mining and database systems: Where is the intersection? Data Engineering Bulletin, 21(1):4-8, 1998.Google Scholar
7.M.-S. Chen, J. Han, and P. S. Yu. Data mining - an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866-883, 1996. Google ScholarDigital Library
8.M. Ester, A. Frommelt, H.-P. Kriegel, and J. Sander. Algorithms for characterization and trend detection in spatial databases. In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 44-50, 1998.Google Scholar
9.M. Ester, H.-P. Kriegel, and J. Sander. Spatial data mining: A database approach. In LNCS 1262: Proc. of 5th Intl. Symposium on Spatial Databases (SSD'97), pages 47-66, 1999. Google ScholarDigital Library
10.C. Faloutsos, B. Seeger, C. Traina Jr., and A. Traina. Spatial join selectivity using power laws. In Proc. ACM SIGMOD 2000 Conf. on Management of Data, pages 177-188, 2000. Google ScholarDigital Library
11.U. M. Fayyad. Mining databases - towards algorithms for knowledge discovery. Data Engineering Bulletin, 21(1):39-48, 1998.Google Scholar
12.U. M. Fayyad, C. Reina, and P. S. Bradley. Initialization of iterative refinement clustering algorithms. In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 194-198, 1998.Google Scholar
13.V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining very large databases. IEEE Computer, 32(8):38-45, 1999. Google ScholarDigital Library
14.J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-based multidimensional data mining. IEEE Computer, 32(8):46-50, 1999. Google ScholarDigital Library
15.C. Traina Jr., A. Traina, L. Wu, and C. Faloutsos. Fast feature selection using the fractal dimension. In XV Brazilian Symposium on Databases (SBBD), 2000.Google Scholar
16.R. J. Bayardo Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In Proc. of IEEE Intl. Conference on Data Engineering, pages 188-197, 1999. Google ScholarDigital Library
17.D. A. Keim and H.-P. Kriegel. Possibilities and limits in visualizing large amounts of multidimensional data. In Perceptual Issues in Visualization. Springer, 1994.Google Scholar
18.E. M. Knorr and R. T. Ng. Finding aggregate proximity relationships and commonalities in spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 8(6):884-897, 1996. Google ScholarDigital Library
19.R. T. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. In Proc. of VLDB - Very Large Data Bases, pages 144-155, 1994. Google ScholarDigital Library
20.Bureau of Census. Tiger/line preeensus files: 1990 technical documentation. Bureau of the Census. Washington, DC, 1989.Google Scholar
21.B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proc. of IEEE Intl. Conference on Data Engineering, pages 589-598, 2000. Google ScholarDigital Library
22.M. Schroeder. Fractals, Chaos, Power Laws. W.H. Freeman and Company, New York, 1991.Google Scholar
23.H. G. Schuster. Deterministic Chaos. VCH Publisher, Weinheim, Basel, Cambridge, New York, 1988.Google Scholar
24.G. Sheikholeslami, S. Chatterjee, and A. Zhang. Wavecluster: A multi-resolution clustering approach for very large spatial databases. In Proe. of VLDB - Very Large Data Bases, pages 428-439, 1998. Google ScholarDigital Library
25.M. L. Tian Zhang, R. Ramakrishnan. Birch: An efficient data clustering method for very large databases. In Proe. of A CM SIGMOD Conf. on Management of Data, pages 103-114, 1996. Google ScholarDigital Library
26.R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of VLDB - Very Large Data Bases, pages 194-205, 1998. Google ScholarDigital Library

Index Terms

Tri-plots: scalable tools for multidimensional data mining

Recommendations

Tri-partition neighborhood covering reduction for robust classification

Neighborhood Covering Reduction extracts rules for classification through formulating the covering of data space with neighborhoods. The covering of neighborhoods is constructed based on distance measure and strictly constrained to be homogeneous. ...
Read More
Multi-Classification by Using Tri-Class SVM

The standard form for dealing with multi-class classification problems when bi-classifiers are used is to consider a two-phase (decomposition, reconstruction) training scheme. The most popular decomposition procedures are pairwise coupling (one versus ...
Read More
A Novel Semi-Supervised SVM Based on Tri-Training
IITA '08: Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application - Volume 03

One of the main difficulties in machine learning is how to solve large-scale problems effectively, and the labeled data are limited and fairly expensive to obtain. In this paper a new semi-supervised SVM algorithm is proposed. It applies tri-training to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Conference Chair:
Doheon Lee
Chonnam National University, Korea
,
General Chair:
Mario Schkolnick
SGI
,
Program Chairs:
Foster Provost
New York University
,
Ramakrishnan Srikant
IBM Almaden Research Center
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 624
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Tri-plots: scalable tools for multidimensional data mining

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Tri-partition neighborhood covering reduction for robust classification

Multi-Classification by Using Tri-Class SVM

A Novel Semi-Supervised SVM Based on Tri-Training