skip to main content
research-article

On exploiting the power of time in data mining

Published:20 December 2008Publication History
Skip Abstract Section

Abstract

We introduce the new paradigm of Change Mining as data mining over a volatile, evolving world with the objective of understanding change. While there is much work on incremental mining and stream mining, both focussing on the adaptation of patterns to a changing data distribution, Change Mining concentrates on understanding the changes themselves. This includes detecting when change occurs in the population under observation, describing the change, predicting change and pro-acting towards it. We identify the main tasks of Change Mining and discuss to what extent they are already present in related research areas. We elaborate on research results that can contribute to these tasks, giving a brief overview of the current state of the art and identifying open areas and challenges for the new research area.

References

  1. C. Aggarwal. On change diagnosis in evolving data streams. IEEE TKDE, 17(5):587--600, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. In Proc. of Int. Conf. on Very Large Data Bases (VLDB'03), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C.C. Aggarwal and P.S. Yu. A Framework for Clustering Massive Text and Categorical Data Streams. In Proceedings of the SIAM conference on Data Mining 2006, April 2006.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Agrawal and G. Psaila. Active data mining. In M. Fayyad, Usama and R. Uthurusamy, editors, Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 3--8, Montreal, Quebec, Canada, 1995. AAAI Press, Menlo Park, CA, USA.Google ScholarGoogle Scholar
  5. S. Baron, M. Spiliopoulou, and O. Günther. Efficient monitoring of patterns in data mining environments. In Proc. of 7th East-European Conf. on Advances in Databases and Inf. Sys. (ADBIS'03), LNCS, pages 253--265. Springer, Sept. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. I. Bartolini, P. Ciaccia, I. Ntoutsi, M. Patella, and Y. Theodoridis. A unified and flexible framework for comparing simple and complex patterns. In Proc. of ECML/PKDD 2004, Pisa, Italy, Sept. 2004. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Bille. A survey on tree edit distance and related problems. Theoretical Computer Science, 337(1-3):217--239, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Boettcher, D. Nauck, D. Ruta, and M. Spott. Towards a framework for change detection in datasets. In Proceedings of the 26th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pages 115--128. Springer, 2006.Google ScholarGoogle Scholar
  9. M. Boettcher, D. Nauck, D. Ruta, and M. Spott. A framework for discovering and analyzing changing customer segments. In Proceedings of the 7th Industrial Conference on Data Mining (ICDM2007), LNAI 4597, pages 255--268. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Breiman. The heuristics of instability in model selection. Annals of Statistics, 24:2350--2383, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  11. F. Cao, M. Ester, W. Qian, and A. Zhou. Density-Based Clustering over an Evolving Data Stream with Noise. In Proc. SIAM Conf. Data Mining, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Chakrabarti, S. Sarawagi, and B. Dom. Mining Surprising Patterns Using Temporal Description Length. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB'98, pages 606--617, New York City, NY, August 1998. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description length. In Proceedings of the 24th International Conference on Very Large Databases, pages 606--617. Morgan Kaufmann Publishers Inc., 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M.-C. Chen, A.-L. Chiu, and H.-H. Chang. Mining changes in customer behavior in retail marketing. Expert Systems with Applications, 28(4):773--781, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Dong, J. Han, and L. Lakshmanan. Online mining of changes from data streams - research problems and preliminary results. In Proceedings of the ACM SIGMOD Workshop on Management and Processing of Data Streams, June 2003.Google ScholarGoogle Scholar
  16. G. Dong and J. Li. Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--52, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental Clustering for Mining in a Data Warehousing Environment. In Proceedings of the 24th International Conference on Very Large Data Bases, pages 323--333, New York City, New York, USA, August 1998. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Ganti, J. Gehrke, and R. Ramakrishnan. A Framework for Measuring Changes in Data Characteristics. In Proc. of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 126--137, Philadelphia, Pennsylvania, May 1999. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Ganti, J. Gehrke, and R. Ramakrishnan. CACTUS: Clustering categorical data using summaries. In Proc. of 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD '99), pages 73--83, San Diego, CA, Aug. 1999. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Ganti, J. Gehrke, and R. Ramakrishnan. DEMON: Mining and Monitoring Evolving Data. In Proc. of the 15th Int. Conf. on Data Engineering (ICDE'2000), pages 439--448, San Diego, CA, USA, Feb. 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Guralnik and J. Srivastava. Event detection from time series data. In KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 33--42, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Höppner and M. Böttcher. Matching partitions over time to reliably capture local clusters in noisy domains. In Principles and Practice of Knowledge Discovery in Databases PKDD, pages 479--486, Warsaw, Poland, 2007. Springer.Google ScholarGoogle Scholar
  23. P. Kalnis, N. Mamoulis, and S. Bakiras. On Discovering Moving Clusters in Spatio-temporal Data. In Proc. of 9th Int. Symposium on Advances in Spatial and Temporal Databases (SSTD'2005), number 3633 in LNCS, pages 364--381, Angra dos Reis, Brazil, Aug. 2005. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Keogh, S. Lonardi, and B.Y. chi' Chiu. Finding surprising patterns in a time series database in linear time and space. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 550--556, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J.K. Kim, H.S. Song, T.S. Kim, and H.K. Kim. Detecting the change of customer behavior based on decision tree analysis. Expert Systems, 22(4):193--205, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  26. R.-H. Li and G.G. Belford. Instability of decision tree classification algorithms. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 570--575, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Liu, W. Hsu, H.-S. Han, and Y. Xia. Mining changes for real-life applications. In Proceedings of the 2nd International Conference on Data Warehousing and Knowledge Discovery, pages 337--346, London, UK, 2000. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Liu, W. Hsu, and Y. Ma. Discovering the set of fundamental rule changes. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 335--340, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Liu, Y. Ma, and R. Lee. Analyzing the interestingness of association rules from the temporal dimension. In Proceedings of the IEEE International Conference on Data Mining, pages 377--384. IEEE Computer Society, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Liu and A. Tuzhilin. Managing large collections of data mining models. Communications of ACM, 51(2):85--89, Feb. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Ma and S. Perkins. Online novelty detection on temporal sequences. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 613--618, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Maddalena and B. Catania. Towards an interoperable solution for pattern management. In 3rd Int. Workshop on Database Interoperability INTERDB'07 (in conjunction with VLDB'07), Vienna, Austria, Sept. 2007.Google ScholarGoogle Scholar
  33. Q. Mei and C. Zhai. Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. In Proc. of 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), pages 198--207, Chicago, IL, Aug. 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. O. Nasraoui, C. Cardona-Uribe, and C. Rojas-Coronel. Tecno-Streams: Tracking evolving clusters in noisy data streams with an scalable immune system learning method. In Proc. IEEE Int. Conf. on Data Mining (ICDM'03), Melbourne, Australia, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Pĕchouček, O. Štĕpánková, and P.Mikšovský. Maintenance of Discovered Knowledge. In Proceedings of the 3rd European Conference on Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, pages 476--483, Prague, Czech Republic, September 1999. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. E. L. Rissland and M. T. Friedman. Detecting change in legal concepts. In ICAIL '95: Proceedings of the 5th International Conference on Artificial Intelligence and Law, pages 127--136, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J.F. Roddick, M. Spiliopoulou, D. Lister, and A. Ceglar. Higher order mining. submitted for publication, 2007.Google ScholarGoogle Scholar
  38. R. Schult and M. Spiliopoulou. Discovering emerging topics in unlabelled text collections. In Proc. of AD-BIS'2006, Thessaloniki, Greece, Sept. 2006. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Schulz, M. Spiliopoulou, and R. Schult. Topic and cluster evolution over noisy document streams. In F. Masseglia, P. Poncelet, and M. Teisseire, editors, Data Mining Patterns: New Methods and Applications. Idea Group, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  40. M. Spiliopoulou, I. Ntoutsi, Y. Theodoridis, and R. Schult. Monic -- modeling and monitoring cluster transitions. In Proc. of 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), pages 706--711, Philadelphia, USA, Aug. 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Vazirgiannis, M. Halkidi, and D. Gunopoulos. Uncertainty Handling and Quality Assessment in Data Mining. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H. Yang, S. Parthasarathy, and S. Mehta. A generalized framework for mining spatio-temporal patterns in scientific data. In Proc. of 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), pages 716--721, Chicago, IL, Aug. 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. Zhang, J.T.L. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs and related problems. In Z. Galil and E. Ukkonen, editors, Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching, pages 395--407. Springer-Verlag, Berlin, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  44. X. Zhang, G. Dong, and R. Kotagiri. Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 310--314, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. Zhou, C. Feng, W. Qian, and C. Jin. Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 2007.Google ScholarGoogle Scholar

Index Terms

  1. On exploiting the power of time in data mining

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGKDD Explorations Newsletter
          ACM SIGKDD Explorations Newsletter  Volume 10, Issue 2
          December 2008
          98 pages
          ISSN:1931-0145
          EISSN:1931-0153
          DOI:10.1145/1540276
          Issue’s Table of Contents

          Copyright © 2008 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 December 2008

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader