Abstract
We introduce the new paradigm of Change Mining as data mining over a volatile, evolving world with the objective of understanding change. While there is much work on incremental mining and stream mining, both focussing on the adaptation of patterns to a changing data distribution, Change Mining concentrates on understanding the changes themselves. This includes detecting when change occurs in the population under observation, describing the change, predicting change and pro-acting towards it. We identify the main tasks of Change Mining and discuss to what extent they are already present in related research areas. We elaborate on research results that can contribute to these tasks, giving a brief overview of the current state of the art and identifying open areas and challenges for the new research area.
- C. Aggarwal. On change diagnosis in evolving data streams. IEEE TKDE, 17(5):587--600, May 2005. Google ScholarDigital Library
- C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. In Proc. of Int. Conf. on Very Large Data Bases (VLDB'03), 2003. Google ScholarDigital Library
- C.C. Aggarwal and P.S. Yu. A Framework for Clustering Massive Text and Categorical Data Streams. In Proceedings of the SIAM conference on Data Mining 2006, April 2006.Google ScholarCross Ref
- R. Agrawal and G. Psaila. Active data mining. In M. Fayyad, Usama and R. Uthurusamy, editors, Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 3--8, Montreal, Quebec, Canada, 1995. AAAI Press, Menlo Park, CA, USA.Google Scholar
- S. Baron, M. Spiliopoulou, and O. Günther. Efficient monitoring of patterns in data mining environments. In Proc. of 7th East-European Conf. on Advances in Databases and Inf. Sys. (ADBIS'03), LNCS, pages 253--265. Springer, Sept. 2003.Google ScholarCross Ref
- I. Bartolini, P. Ciaccia, I. Ntoutsi, M. Patella, and Y. Theodoridis. A unified and flexible framework for comparing simple and complex patterns. In Proc. of ECML/PKDD 2004, Pisa, Italy, Sept. 2004. Springer Verlag. Google ScholarDigital Library
- P. Bille. A survey on tree edit distance and related problems. Theoretical Computer Science, 337(1-3):217--239, 2005. Google ScholarDigital Library
- M. Boettcher, D. Nauck, D. Ruta, and M. Spott. Towards a framework for change detection in datasets. In Proceedings of the 26th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pages 115--128. Springer, 2006.Google Scholar
- M. Boettcher, D. Nauck, D. Ruta, and M. Spott. A framework for discovering and analyzing changing customer segments. In Proceedings of the 7th Industrial Conference on Data Mining (ICDM2007), LNAI 4597, pages 255--268. Springer, 2007. Google ScholarDigital Library
- L. Breiman. The heuristics of instability in model selection. Annals of Statistics, 24:2350--2383, 1996.Google ScholarCross Ref
- F. Cao, M. Ester, W. Qian, and A. Zhou. Density-Based Clustering over an Evolving Data Stream with Noise. In Proc. SIAM Conf. Data Mining, 2006.Google ScholarCross Ref
- S. Chakrabarti, S. Sarawagi, and B. Dom. Mining Surprising Patterns Using Temporal Description Length. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB'98, pages 606--617, New York City, NY, August 1998. Morgan Kaufmann. Google ScholarDigital Library
- S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description length. In Proceedings of the 24th International Conference on Very Large Databases, pages 606--617. Morgan Kaufmann Publishers Inc., 1998. Google ScholarDigital Library
- M.-C. Chen, A.-L. Chiu, and H.-H. Chang. Mining changes in customer behavior in retail marketing. Expert Systems with Applications, 28(4):773--781, 2005. Google ScholarDigital Library
- G. Dong, J. Han, and L. Lakshmanan. Online mining of changes from data streams - research problems and preliminary results. In Proceedings of the ACM SIGMOD Workshop on Management and Processing of Data Streams, June 2003.Google Scholar
- G. Dong and J. Li. Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--52, 1999. Google ScholarDigital Library
- M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental Clustering for Mining in a Data Warehousing Environment. In Proceedings of the 24th International Conference on Very Large Data Bases, pages 323--333, New York City, New York, USA, August 1998. Morgan Kaufmann. Google ScholarDigital Library
- V. Ganti, J. Gehrke, and R. Ramakrishnan. A Framework for Measuring Changes in Data Characteristics. In Proc. of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 126--137, Philadelphia, Pennsylvania, May 1999. ACM Press. Google ScholarDigital Library
- V. Ganti, J. Gehrke, and R. Ramakrishnan. CACTUS: Clustering categorical data using summaries. In Proc. of 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD '99), pages 73--83, San Diego, CA, Aug. 1999. ACM Press. Google ScholarDigital Library
- V. Ganti, J. Gehrke, and R. Ramakrishnan. DEMON: Mining and Monitoring Evolving Data. In Proc. of the 15th Int. Conf. on Data Engineering (ICDE'2000), pages 439--448, San Diego, CA, USA, Feb. 2000. IEEE Computer Society. Google ScholarDigital Library
- V. Guralnik and J. Srivastava. Event detection from time series data. In KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 33--42, New York, NY, USA, 1999. ACM. Google ScholarDigital Library
- F. Höppner and M. Böttcher. Matching partitions over time to reliably capture local clusters in noisy domains. In Principles and Practice of Knowledge Discovery in Databases PKDD, pages 479--486, Warsaw, Poland, 2007. Springer.Google Scholar
- P. Kalnis, N. Mamoulis, and S. Bakiras. On Discovering Moving Clusters in Spatio-temporal Data. In Proc. of 9th Int. Symposium on Advances in Spatial and Temporal Databases (SSTD'2005), number 3633 in LNCS, pages 364--381, Angra dos Reis, Brazil, Aug. 2005. Springer. Google ScholarDigital Library
- E. Keogh, S. Lonardi, and B.Y. chi' Chiu. Finding surprising patterns in a time series database in linear time and space. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 550--556, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- J.K. Kim, H.S. Song, T.S. Kim, and H.K. Kim. Detecting the change of customer behavior based on decision tree analysis. Expert Systems, 22(4):193--205, 2005.Google ScholarCross Ref
- R.-H. Li and G.G. Belford. Instability of decision tree classification algorithms. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 570--575, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- B. Liu, W. Hsu, H.-S. Han, and Y. Xia. Mining changes for real-life applications. In Proceedings of the 2nd International Conference on Data Warehousing and Knowledge Discovery, pages 337--346, London, UK, 2000. Springer. Google ScholarDigital Library
- B. Liu, W. Hsu, and Y. Ma. Discovering the set of fundamental rule changes. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 335--340, 2001. Google ScholarDigital Library
- B. Liu, Y. Ma, and R. Lee. Analyzing the interestingness of association rules from the temporal dimension. In Proceedings of the IEEE International Conference on Data Mining, pages 377--384. IEEE Computer Society, 2001. Google ScholarDigital Library
- B. Liu and A. Tuzhilin. Managing large collections of data mining models. Communications of ACM, 51(2):85--89, Feb. 2008. Google ScholarDigital Library
- J. Ma and S. Perkins. Online novelty detection on temporal sequences. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 613--618, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- A. Maddalena and B. Catania. Towards an interoperable solution for pattern management. In 3rd Int. Workshop on Database Interoperability INTERDB'07 (in conjunction with VLDB'07), Vienna, Austria, Sept. 2007.Google Scholar
- Q. Mei and C. Zhai. Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. In Proc. of 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), pages 198--207, Chicago, IL, Aug. 2005. ACM Press. Google ScholarDigital Library
- O. Nasraoui, C. Cardona-Uribe, and C. Rojas-Coronel. Tecno-Streams: Tracking evolving clusters in noisy data streams with an scalable immune system learning method. In Proc. IEEE Int. Conf. on Data Mining (ICDM'03), Melbourne, Australia, 2003. Google ScholarDigital Library
- M. Pĕchouček, O. Štĕpánková, and P.Mikšovský. Maintenance of Discovered Knowledge. In Proceedings of the 3rd European Conference on Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, pages 476--483, Prague, Czech Republic, September 1999. Springer. Google ScholarDigital Library
- E. L. Rissland and M. T. Friedman. Detecting change in legal concepts. In ICAIL '95: Proceedings of the 5th International Conference on Artificial Intelligence and Law, pages 127--136, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
- J.F. Roddick, M. Spiliopoulou, D. Lister, and A. Ceglar. Higher order mining. submitted for publication, 2007.Google Scholar
- R. Schult and M. Spiliopoulou. Discovering emerging topics in unlabelled text collections. In Proc. of AD-BIS'2006, Thessaloniki, Greece, Sept. 2006. Springer. Google ScholarDigital Library
- S. Schulz, M. Spiliopoulou, and R. Schult. Topic and cluster evolution over noisy document streams. In F. Masseglia, P. Poncelet, and M. Teisseire, editors, Data Mining Patterns: New Methods and Applications. Idea Group, 2007.Google ScholarCross Ref
- M. Spiliopoulou, I. Ntoutsi, Y. Theodoridis, and R. Schult. Monic -- modeling and monitoring cluster transitions. In Proc. of 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), pages 706--711, Philadelphia, USA, Aug. 2006. ACM. Google ScholarDigital Library
- M. Vazirgiannis, M. Halkidi, and D. Gunopoulos. Uncertainty Handling and Quality Assessment in Data Mining. Springer, 2003. Google ScholarDigital Library
- H. Yang, S. Parthasarathy, and S. Mehta. A generalized framework for mining spatio-temporal patterns in scientific data. In Proc. of 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), pages 716--721, Chicago, IL, Aug. 2005. ACM Press. Google ScholarDigital Library
- K. Zhang, J.T.L. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs and related problems. In Z. Galil and E. Ukkonen, editors, Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching, pages 395--407. Springer-Verlag, Berlin, 1995.Google ScholarCross Ref
- X. Zhang, G. Dong, and R. Kotagiri. Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 310--314, 2000. Google ScholarDigital Library
- A. Zhou, C. Feng, W. Qian, and C. Jin. Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 2007.Google Scholar
Index Terms
- On exploiting the power of time in data mining
Recommendations
Real-time stream data mining based on CanTree and Gtree
Proposed algorithm discovers complete frequent itemsets from the stream data.It uses CanTree to store transactions and has an efficient algorithm for sliding-windows.GTree is proposed to find frequent itemsets and serves as a projection-tree.GTree uses ...
Mining uncertain data
As an important data mining and knowledge discovery task, association rule mining searches for implicit, previously unknown, and potentially useful pieces of information—in the form of rules revealing associative relationships—that are embedded in the ...
Mining fuzzy specific rare itemsets for education data
Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...
Comments