skip to main content
article

Towards effective and interpretable data mining by visual interaction

Published:01 January 2002Publication History
Skip Abstract Section

Abstract

The primary aim of most data mining algorithms is to facilitate the discovery of concise and interpretable information from large amounts of data. However, many of the current formalizations of data mining algorithms have not quite reached this goal. One of the reasons for this is that the focus on using purely automated techniques has imposed several constraints on data mining algorithms. For example, any data mining problem such as clustering or association rules requires the specification of particular problem formulations, objective functions, and parameters. Such systems fail to take the user's needs into account very effectively. This makes it necessary to keep the user in the loop in a way which is both efficient and interpretable. One unique way of achieving this is by leveraging human visual perceptions on intermediate data mining results. Such a system combines the computational power of a computer and the intuitive abilities of a human to provide solutions which cannot be achieved by either. This paper will discuss a number of recent approaches to several data mining algorithms along these lines.

References

  1. C. C. Aggarwal, P. S. Yu. Online Generation of Association Rules. ICDE Conference, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal et al. Fast algorithms for projected clustering. ACM SIGMOD Conference Proceedings, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. C. Aggarwal. A Human-Computer Cooperative System for Effective High Dimensional Clustering. ACM KDD Conference, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. C. Aggarwal. Towards Exploratory Instance Centered Classification of High Dimensional Data. IBM Research Report, 2002.]]Google ScholarGoogle Scholar
  5. C. C. Aggarwal. Towards Meaningful High Dimensional Nearest Neighbor Search by Human-Computer Interaction. ICDE Conference, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. C. C. Aggarwal, A. Hinneburg, D. A. Keim. On the Surprising Behavior of Distance Metrics in High Dimensional Space. ICDT Conference Proceedings, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. C. Aggarwal, P. S. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD Conference Proceedings, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. C. Aggarwal, P. S. Yu. The IGrid Index: Reversing the Dimensionality Curse for Similarity Indexing in High Dimensional Space. ACM SIGKDD Conference Proceedings, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Conference Proceedings, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Ankerst, M. Ester, H.-P. Kriegel. Towards an Effective Cooperation of the User and the Computer for Classification. KDD Conference Proceedings, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Berchtold, D. A. Keim, H.-P. Kriegel: The X-Tree: An Index Structure for High-Dimensional Data, VLDB Conference Proceedings, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Beyer, R. Ramakrishnan, U. Shaft, J. Goldstein. When is nearest neighbor meaningful? Proceedings of the ICDT Conference, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Chakrabarti, S. Mehrotra. Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB Conference Proceedings, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, X. Xu. Density-Connected Sets and their Application for Trend Detection in Spatial databases. Proceedings of the KDD Conference, 1997.]]Google ScholarGoogle Scholar
  15. C. Faloutsos et al. Efficient and Effective Querying by Image Content. Journal of Intelligent Information Systems, Vol 3, pp. 231-262, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Han, L. Lakshmanan, R. Ng. Constraint Based Multidimensional Data Mining. IEEE Computer, Vol. 32, no. 8, 1999, pp. 46-50.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the VLDB Conference, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Hinneburg, D. A. Keim, M. Wawryniuk. HD-Eye: Visual Mining of High Dimensional Data. IEEE Computer Graphics and Applications, 19(5), pp. 22-31, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Hofman, A. Siebes, A. Wilhelm. Visualizing Association Rules with Interactive Mosaic Plots. ACM KDD Conference, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Jain, R. Dubes. Algorithms for Clustering Data, Prentice Hall, New Jersey, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Katayama, S. Satoh: The SR-Tree: An Index Structure for High Dimensional Nearest Neighbor Queries. ACM SIGMOD Conference, pages 369-380, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Katayama, S. Satoh. Distinctiveness Sensitive Nearest Neighbor Search for Efficient Similarity Retrieval of Multimedia Information. Proceedings of the ICDE Conference, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. A. Keim. Visual Support for Query Specification and Data Mining. Shaker Publishing Company, Aachen, Germany 1995.]]Google ScholarGoogle Scholar
  24. D. A. Keim, H.-P. Kriegel, T. Seidl. Supporting Data Mining of Large Databases by Visual Feedback Queries. ICDE Conference, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K.-I. Lin, H. V. Jagadish, C. Faloutsos The TV-tree: An Index Structure for High Dimensional Data. VLDB Journal, Volume 3, Number 4, pages 517-542, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Rui, T. S. Huang, S. Mehrotra, Content-based image retrieval with relevance feedback in MARS. Proceedings of the IEEE Conference on Image Processing, 1997.]]Google ScholarGoogle ScholarCross RefCross Ref
  27. G. Salton. THE SMART Retrieval System - Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ, 1971.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Sarawagi. User-adaptive Exploration of Multidimensional Data. VLDB Conference Proceedings, pp. 307-316, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Seidl, H.-P. Kriegel: Efficient User-Adaptable Similarity Search in Large Multimedia Databases. VLDB Conference Proceedings, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. W. Silverman. Density Estimation for Statistics and Data Analysis, Chapman and Hall, 1986.]]Google ScholarGoogle ScholarCross RefCross Ref
  31. R. Srikant, R. Agrawal. Mining Quantitative Association Rules in Large Relational Tables. ACM SIGMOD Conference, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. K. H. Tung, R. Ng, L. V. S. Lakshmanan, J. Han. Constraint-based clustering in large databases. ICDT Conference, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Weber, H.-J. Schek, S. Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, VLDB Conference Proceedings, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Wu, C. Faloutsos, K. Sycara, T. Payne. FALCON: Feedback Adaptive Loop for Content-Based Retrieval. VLDB Conference Proceedings, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards effective and interpretable data mining by visual interaction
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGKDD Explorations Newsletter
        ACM SIGKDD Explorations Newsletter  Volume 3, Issue 2
        January 2002
        81 pages
        ISSN:1931-0145
        EISSN:1931-0153
        DOI:10.1145/507515
        Issue’s Table of Contents

        Copyright © 2002 Author

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 January 2002

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader