Abstract
The primary aim of most data mining algorithms is to facilitate the discovery of concise and interpretable information from large amounts of data. However, many of the current formalizations of data mining algorithms have not quite reached this goal. One of the reasons for this is that the focus on using purely automated techniques has imposed several constraints on data mining algorithms. For example, any data mining problem such as clustering or association rules requires the specification of particular problem formulations, objective functions, and parameters. Such systems fail to take the user's needs into account very effectively. This makes it necessary to keep the user in the loop in a way which is both efficient and interpretable. One unique way of achieving this is by leveraging human visual perceptions on intermediate data mining results. Such a system combines the computational power of a computer and the intuitive abilities of a human to provide solutions which cannot be achieved by either. This paper will discuss a number of recent approaches to several data mining algorithms along these lines.
- C. C. Aggarwal, P. S. Yu. Online Generation of Association Rules. ICDE Conference, 1998.]] Google ScholarDigital Library
- C. C. Aggarwal et al. Fast algorithms for projected clustering. ACM SIGMOD Conference Proceedings, 1999.]] Google ScholarDigital Library
- C. C. Aggarwal. A Human-Computer Cooperative System for Effective High Dimensional Clustering. ACM KDD Conference, 2001.]] Google ScholarDigital Library
- C. C. Aggarwal. Towards Exploratory Instance Centered Classification of High Dimensional Data. IBM Research Report, 2002.]]Google Scholar
- C. C. Aggarwal. Towards Meaningful High Dimensional Nearest Neighbor Search by Human-Computer Interaction. ICDE Conference, 2002.]]Google ScholarCross Ref
- C. C. Aggarwal, A. Hinneburg, D. A. Keim. On the Surprising Behavior of Distance Metrics in High Dimensional Space. ICDT Conference Proceedings, 2001.]] Google ScholarDigital Library
- C. C. Aggarwal, P. S. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD Conference Proceedings, 2000.]] Google ScholarDigital Library
- C. C. Aggarwal, P. S. Yu. The IGrid Index: Reversing the Dimensionality Curse for Similarity Indexing in High Dimensional Space. ACM SIGKDD Conference Proceedings, 2000.]] Google ScholarDigital Library
- R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Conference Proceedings, 1998.]] Google ScholarDigital Library
- M. Ankerst, M. Ester, H.-P. Kriegel. Towards an Effective Cooperation of the User and the Computer for Classification. KDD Conference Proceedings, 2000.]] Google ScholarDigital Library
- S. Berchtold, D. A. Keim, H.-P. Kriegel: The X-Tree: An Index Structure for High-Dimensional Data, VLDB Conference Proceedings, 1996.]] Google ScholarDigital Library
- K. Beyer, R. Ramakrishnan, U. Shaft, J. Goldstein. When is nearest neighbor meaningful? Proceedings of the ICDT Conference, 1999.]] Google ScholarDigital Library
- K. Chakrabarti, S. Mehrotra. Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB Conference Proceedings, 2000.]] Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, X. Xu. Density-Connected Sets and their Application for Trend Detection in Spatial databases. Proceedings of the KDD Conference, 1997.]]Google Scholar
- C. Faloutsos et al. Efficient and Effective Querying by Image Content. Journal of Intelligent Information Systems, Vol 3, pp. 231-262, 1994.]] Google ScholarDigital Library
- J. Han, L. Lakshmanan, R. Ng. Constraint Based Multidimensional Data Mining. IEEE Computer, Vol. 32, no. 8, 1999, pp. 46-50.]] Google ScholarDigital Library
- A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the VLDB Conference, 2000.]] Google ScholarDigital Library
- A. Hinneburg, D. A. Keim, M. Wawryniuk. HD-Eye: Visual Mining of High Dimensional Data. IEEE Computer Graphics and Applications, 19(5), pp. 22-31, 1999.]] Google ScholarDigital Library
- H. Hofman, A. Siebes, A. Wilhelm. Visualizing Association Rules with Interactive Mosaic Plots. ACM KDD Conference, 2000.]] Google ScholarDigital Library
- A. Jain, R. Dubes. Algorithms for Clustering Data, Prentice Hall, New Jersey, 1998.]] Google ScholarDigital Library
- N. Katayama, S. Satoh: The SR-Tree: An Index Structure for High Dimensional Nearest Neighbor Queries. ACM SIGMOD Conference, pages 369-380, 1997.]] Google ScholarDigital Library
- N. Katayama, S. Satoh. Distinctiveness Sensitive Nearest Neighbor Search for Efficient Similarity Retrieval of Multimedia Information. Proceedings of the ICDE Conference, 2001.]] Google ScholarDigital Library
- D. A. Keim. Visual Support for Query Specification and Data Mining. Shaker Publishing Company, Aachen, Germany 1995.]]Google Scholar
- D. A. Keim, H.-P. Kriegel, T. Seidl. Supporting Data Mining of Large Databases by Visual Feedback Queries. ICDE Conference, 1994.]] Google ScholarDigital Library
- K.-I. Lin, H. V. Jagadish, C. Faloutsos The TV-tree: An Index Structure for High Dimensional Data. VLDB Journal, Volume 3, Number 4, pages 517-542, 1992.]] Google ScholarDigital Library
- Y. Rui, T. S. Huang, S. Mehrotra, Content-based image retrieval with relevance feedback in MARS. Proceedings of the IEEE Conference on Image Processing, 1997.]]Google ScholarCross Ref
- G. Salton. THE SMART Retrieval System - Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ, 1971.]] Google ScholarDigital Library
- S. Sarawagi. User-adaptive Exploration of Multidimensional Data. VLDB Conference Proceedings, pp. 307-316, 2000.]] Google ScholarDigital Library
- T. Seidl, H.-P. Kriegel: Efficient User-Adaptable Similarity Search in Large Multimedia Databases. VLDB Conference Proceedings, 1997.]] Google ScholarDigital Library
- B. W. Silverman. Density Estimation for Statistics and Data Analysis, Chapman and Hall, 1986.]]Google ScholarCross Ref
- R. Srikant, R. Agrawal. Mining Quantitative Association Rules in Large Relational Tables. ACM SIGMOD Conference, 1996.]] Google ScholarDigital Library
- A. K. H. Tung, R. Ng, L. V. S. Lakshmanan, J. Han. Constraint-based clustering in large databases. ICDT Conference, 2001.]] Google ScholarDigital Library
- R. Weber, H.-J. Schek, S. Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, VLDB Conference Proceedings, 1998.]] Google ScholarDigital Library
- L. Wu, C. Faloutsos, K. Sycara, T. Payne. FALCON: Feedback Adaptive Loop for Content-Based Retrieval. VLDB Conference Proceedings, 2000.]] Google ScholarDigital Library
Index Terms
- Towards effective and interpretable data mining by visual interaction
Recommendations
Mining uncertain data
As an important data mining and knowledge discovery task, association rule mining searches for implicit, previously unknown, and potentially useful pieces of information—in the form of rules revealing associative relationships—that are embedded in the ...
Mining fuzzy specific rare itemsets for education data
Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...
Towards data mining benchmarking: a test bed for performance study of frequent pattern mining
Performance benchmarking has played an important role in the research and development in relational DBMS, object-relational DBMS, data warehouse systems, etc. We believe that benchmarking data mining algorithms is a long overdue task, and it will play ...
Comments