article

Towards effective and interpretable data mining by visual interaction

Author:
Charu C. Aggarwal

IBM T. J. Watson Research Center, Yorktown Heights, NY

IBM T. J. Watson Research Center, Yorktown Heights, NY
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 3 Issue 2January 2002pp 11–22https://doi.org/10.1145/507515.507518

Published:01 January 2002Publication History

ACM SIGKDD Explorations Newsletter

Abstract

The primary aim of most data mining algorithms is to facilitate the discovery of concise and interpretable information from large amounts of data. However, many of the current formalizations of data mining algorithms have not quite reached this goal. One of the reasons for this is that the focus on using purely automated techniques has imposed several constraints on data mining algorithms. For example, any data mining problem such as clustering or association rules requires the specification of particular problem formulations, objective functions, and parameters. Such systems fail to take the user's needs into account very effectively. This makes it necessary to keep the user in the loop in a way which is both efficient and interpretable. One unique way of achieving this is by leveraging human visual perceptions on intermediate data mining results. Such a system combines the computational power of a computer and the intuitive abilities of a human to provide solutions which cannot be achieved by either. This paper will discuss a number of recent approaches to several data mining algorithms along these lines.

References

C. C. Aggarwal, P. S. Yu. Online Generation of Association Rules. ICDE Conference, 1998.]] Google ScholarDigital Library
C. C. Aggarwal et al. Fast algorithms for projected clustering. ACM SIGMOD Conference Proceedings, 1999.]] Google ScholarDigital Library
C. C. Aggarwal. A Human-Computer Cooperative System for Effective High Dimensional Clustering. ACM KDD Conference, 2001.]] Google ScholarDigital Library
C. C. Aggarwal. Towards Exploratory Instance Centered Classification of High Dimensional Data. IBM Research Report, 2002.]]Google Scholar
C. C. Aggarwal. Towards Meaningful High Dimensional Nearest Neighbor Search by Human-Computer Interaction. ICDE Conference, 2002.]]Google ScholarCross Ref
C. C. Aggarwal, A. Hinneburg, D. A. Keim. On the Surprising Behavior of Distance Metrics in High Dimensional Space. ICDT Conference Proceedings, 2001.]] Google ScholarDigital Library
C. C. Aggarwal, P. S. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD Conference Proceedings, 2000.]] Google ScholarDigital Library
C. C. Aggarwal, P. S. Yu. The IGrid Index: Reversing the Dimensionality Curse for Similarity Indexing in High Dimensional Space. ACM SIGKDD Conference Proceedings, 2000.]] Google ScholarDigital Library
R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Conference Proceedings, 1998.]] Google ScholarDigital Library
M. Ankerst, M. Ester, H.-P. Kriegel. Towards an Effective Cooperation of the User and the Computer for Classification. KDD Conference Proceedings, 2000.]] Google ScholarDigital Library
S. Berchtold, D. A. Keim, H.-P. Kriegel: The X-Tree: An Index Structure for High-Dimensional Data, VLDB Conference Proceedings, 1996.]] Google ScholarDigital Library
K. Beyer, R. Ramakrishnan, U. Shaft, J. Goldstein. When is nearest neighbor meaningful? Proceedings of the ICDT Conference, 1999.]] Google ScholarDigital Library
K. Chakrabarti, S. Mehrotra. Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB Conference Proceedings, 2000.]] Google ScholarDigital Library
M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, X. Xu. Density-Connected Sets and their Application for Trend Detection in Spatial databases. Proceedings of the KDD Conference, 1997.]]Google Scholar
C. Faloutsos et al. Efficient and Effective Querying by Image Content. Journal of Intelligent Information Systems, Vol 3, pp. 231-262, 1994.]] Google ScholarDigital Library
J. Han, L. Lakshmanan, R. Ng. Constraint Based Multidimensional Data Mining. IEEE Computer, Vol. 32, no. 8, 1999, pp. 46-50.]] Google ScholarDigital Library
A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the VLDB Conference, 2000.]] Google ScholarDigital Library
A. Hinneburg, D. A. Keim, M. Wawryniuk. HD-Eye: Visual Mining of High Dimensional Data. IEEE Computer Graphics and Applications, 19(5), pp. 22-31, 1999.]] Google ScholarDigital Library
H. Hofman, A. Siebes, A. Wilhelm. Visualizing Association Rules with Interactive Mosaic Plots. ACM KDD Conference, 2000.]] Google ScholarDigital Library
A. Jain, R. Dubes. Algorithms for Clustering Data, Prentice Hall, New Jersey, 1998.]] Google ScholarDigital Library
N. Katayama, S. Satoh: The SR-Tree: An Index Structure for High Dimensional Nearest Neighbor Queries. ACM SIGMOD Conference, pages 369-380, 1997.]] Google ScholarDigital Library
N. Katayama, S. Satoh. Distinctiveness Sensitive Nearest Neighbor Search for Efficient Similarity Retrieval of Multimedia Information. Proceedings of the ICDE Conference, 2001.]] Google ScholarDigital Library
D. A. Keim. Visual Support for Query Specification and Data Mining. Shaker Publishing Company, Aachen, Germany 1995.]]Google Scholar
D. A. Keim, H.-P. Kriegel, T. Seidl. Supporting Data Mining of Large Databases by Visual Feedback Queries. ICDE Conference, 1994.]] Google ScholarDigital Library
K.-I. Lin, H. V. Jagadish, C. Faloutsos The TV-tree: An Index Structure for High Dimensional Data. VLDB Journal, Volume 3, Number 4, pages 517-542, 1992.]] Google ScholarDigital Library
Y. Rui, T. S. Huang, S. Mehrotra, Content-based image retrieval with relevance feedback in MARS. Proceedings of the IEEE Conference on Image Processing, 1997.]]Google ScholarCross Ref
G. Salton. THE SMART Retrieval System - Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ, 1971.]] Google ScholarDigital Library
S. Sarawagi. User-adaptive Exploration of Multidimensional Data. VLDB Conference Proceedings, pp. 307-316, 2000.]] Google ScholarDigital Library
T. Seidl, H.-P. Kriegel: Efficient User-Adaptable Similarity Search in Large Multimedia Databases. VLDB Conference Proceedings, 1997.]] Google ScholarDigital Library
B. W. Silverman. Density Estimation for Statistics and Data Analysis, Chapman and Hall, 1986.]]Google ScholarCross Ref
R. Srikant, R. Agrawal. Mining Quantitative Association Rules in Large Relational Tables. ACM SIGMOD Conference, 1996.]] Google ScholarDigital Library
A. K. H. Tung, R. Ng, L. V. S. Lakshmanan, J. Han. Constraint-based clustering in large databases. ICDT Conference, 2001.]] Google ScholarDigital Library
R. Weber, H.-J. Schek, S. Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, VLDB Conference Proceedings, 1998.]] Google ScholarDigital Library
L. Wu, C. Faloutsos, K. Sycara, T. Payne. FALCON: Feedback Adaptive Loop for Content-Based Retrieval. VLDB Conference Proceedings, 2000.]] Google ScholarDigital Library

Index Terms

Towards effective and interpretable data mining by visual interaction
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information systems applications
    1. Data mining

Index terms have been assigned to the content through auto-classification.

Recommendations

Mining uncertain data

As an important data mining and knowledge discovery task, association rule mining searches for implicit, previously unknown, and potentially useful pieces of information—in the form of rules revealing associative relationships—that are embedded in the ...
Read More
Mining fuzzy specific rare itemsets for education data

Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...
Read More
Towards data mining benchmarking: a test bed for performance study of frequent pattern mining
Performance benchmarking has played an important role in the research and development in relational DBMS, object-relational DBMS, data warehouse systems, etc. We believe that benchmarking data mining algorithms is a long overdue task, and it will play ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGKDD Explorations Newsletter Volume 3, Issue 2
January 2002
81 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/507515
Issue’s Table of Contents

Copyright © 2002 Author
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2002
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 733
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards effective and interpretable data mining by visual interaction

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Mining uncertain data

Mining fuzzy specific rare itemsets for education data

Towards data mining benchmarking: a test bed for performance study of frequent pattern mining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards effective and interpretable data mining by visual interaction

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Mining uncertain data

Mining fuzzy specific rare itemsets for education data

Towards data mining benchmarking: a test bed for performance study of frequent pattern mining

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media