Skip to main content
Erschienen in:
Buchtitelbild

2014 | OriginalPaper | Buchkapitel

1. Introduction to Proactive Data Mining

verfasst von : Haim Dahan, Shahar Cohen, Lior Rokach, Oded Maimon

Erschienen in: Proactive Data Mining with Decision Trees

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this chapter, we provide an introduction to the aspects of the exciting field of data mining, which are relevant to this book. In particular, we focus on classification tasks and on decision trees, as an algorithmic approach for solving classification tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aggarwl C (2002) Toward effective and interpretable data mining by visual interaction. ACM SIGKDD Explorations Newsletter 3(2):11–22CrossRef Aggarwl C (2002) Toward effective and interpretable data mining by visual interaction. ACM SIGKDD Explorations Newsletter 3(2):11–22CrossRef
Zurück zum Zitat Ankerst M (2002) Report on the SIGKDD-2002 Panel—the perfect data mining tool: interactive or automated? ACM SIGKDD Explorations Newsletter 4(2):110–111CrossRef Ankerst M (2002) Report on the SIGKDD-2002 Panel—the perfect data mining tool: interactive or automated? ACM SIGKDD Explorations Newsletter 4(2):110–111CrossRef
Zurück zum Zitat Boulicaut J, Jeudy B (2005) Constraint-based data mining, the data mining and knowledge discovery handbook, Springer, pp 399–416 Boulicaut J, Jeudy B (2005) Constraint-based data mining, the data mining and knowledge discovery handbook, Springer, pp 399–416
Zurück zum Zitat Breiman L, Friedman JH, Olshen R A, Stone C J (1984). Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA. ISBN 978–0–412–04841–8 Breiman L, Friedman JH, Olshen R A, Stone C J (1984). Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA. ISBN 978–0–412–04841–8
Zurück zum Zitat Büchner AG, Mulvenna MD (1998) Discovering internet marketing intelligence through online analytical web usage mining. ACM Sigmod Record 27(4):54–61CrossRef Büchner AG, Mulvenna MD (1998) Discovering internet marketing intelligence through online analytical web usage mining. ACM Sigmod Record 27(4):54–61CrossRef
Zurück zum Zitat Buntine W, Niblett T (1992) A further comparison of splitting rules for decision-tree induction. Mach Learn 8:75–85 Buntine W, Niblett T (1992) A further comparison of splitting rules for decision-tree induction. Mach Learn 8:75–85
Zurück zum Zitat Cao L, Zhang C (2006) Domain-driven actionable knowledge discovery in the real world. PAKDD2006, pp 821–830, LNAI 3918 Cao L, Zhang C (2006) Domain-driven actionable knowledge discovery in the real world. PAKDD2006, pp 821–830, LNAI 3918
Zurück zum Zitat Cao L, Zhang C (2007) The evolution of KDD: towards domain-driven data mining, international. J Pattern Recognit Artif intell 21(4):677–692CrossRef Cao L, Zhang C (2007) The evolution of KDD: towards domain-driven data mining, international. J Pattern Recognit Artif intell 21(4):677–692CrossRef
Zurück zum Zitat Cao L (2012) Actionable knowledge discovery and delivery. Wiley Interdiscip Rev Data Min Knowl Discov 2:149–163 Cao L (2012) Actionable knowledge discovery and delivery. Wiley Interdiscip Rev Data Min Knowl Discov 2:149–163
Zurück zum Zitat Ciraco M, Rogalewski M, Weiss G (2005) Improving classifier utility by altering the misclassification cost ratio. In: Proceedings of the 1st international workshop on utility-based data mining, Chicago, pp 46–52 Ciraco M, Rogalewski M, Weiss G (2005) Improving classifier utility by altering the misclassification cost ratio. In: Proceedings of the 1st international workshop on utility-based data mining, Chicago, pp 46–52
Zurück zum Zitat Clarke P (2006) Christmas gift giving involvement. J Consumer Market 23(5):283–291CrossRef Clarke P (2006) Christmas gift giving involvement. J Consumer Market 23(5):283–291CrossRef
Zurück zum Zitat Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221 Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221
Zurück zum Zitat Domingos P (1999) MetaCost: a general method for making classifiers cost sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM Press, pp 155–164 Domingos P (1999) MetaCost: a general method for making classifiers cost sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, ACM Press, pp 155–164
Zurück zum Zitat Drummond C, Holte R (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the 17th International Conference on Machine Learning, 239–246 Drummond C, Holte R (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the 17th International Conference on Machine Learning, 239–246
Zurück zum Zitat Fan W, Stolfo SJ, Zhang J, Chan PK (1999) AdaCost: misclassification cost-sensitive boosting. In: Proceedings of the 16th international conference machine learning, pp 99–105 Fan W, Stolfo SJ, Zhang J, Chan PK (1999) AdaCost: misclassification cost-sensitive boosting. In: Proceedings of the 16th international conference machine learning, pp 99–105
Zurück zum Zitat Fayyad U, Irani KB (1992) The attribute selection problem in decision tree generation. In Proceedings of tenth national conference on artificial intelligence. AAAI Press, Cambridge, pp 104–110 Fayyad U, Irani KB (1992) The attribute selection problem in decision tree generation. In Proceedings of tenth national conference on artificial intelligence. AAAI Press, Cambridge, pp 104–110
Zurück zum Zitat Fayyad U, Shapiro G, Uthurusamy R (2003) Summary from the KDD-03 panel—data mining: the next 10 years. ACM SIGKDD Explor Newslett 5(2) 191–196CrossRef Fayyad U, Shapiro G, Uthurusamy R (2003) Summary from the KDD-03 panel—data mining: the next 10 years. ACM SIGKDD Explor Newslett 5(2) 191–196CrossRef
Zurück zum Zitat Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees. In: Proceedings of the national conference on artificial intelligence, pp. 717–724 Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees. In: Proceedings of the national conference on artificial intelligence, pp. 717–724
Zurück zum Zitat Kyriakopoulos K, Moorman C (2004) Tradeoffs in marketing exploitation and exploration strategies: the overlooked role of market orientatio. Int J Res Market 21:219–240CrossRef Kyriakopoulos K, Moorman C (2004) Tradeoffs in marketing exploitation and exploration strategies: the overlooked role of market orientatio. Int J Res Market 21:219–240CrossRef
Zurück zum Zitat Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In Proceedings of the international ACM-SIGIR conference on research and development in information retrieval, pp 3–12 Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In Proceedings of the international ACM-SIGIR conference on research and development in information retrieval, pp 3–12
Zurück zum Zitat Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228CrossRefMATH Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228CrossRefMATH
Zurück zum Zitat Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In Proceedings 4th international conference on knowledge discovery in databases (KDD-98), New York, pp 73–79 Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In Proceedings 4th international conference on knowledge discovery in databases (KDD-98), New York, pp 73–79
Zurück zum Zitat Liu XY, Zhou ZH (2006) The influence of class imbalance on cost-sensitive learning: an empirical study. In Proceedings of the 6th international conference on data mining, pp. 970–974 Liu XY, Zhou ZH (2006) The influence of class imbalance on cost-sensitive learning: an empirical study. In Proceedings of the 6th international conference on data mining, pp. 970–974
Zurück zum Zitat Loh WY, Shih X (1997) Split selection methods for classification trees. Stat Sinica 7:815–840MATHMathSciNet Loh WY, Shih X (1997) Split selection methods for classification trees. Stat Sinica 7:815–840MATHMathSciNet
Zurück zum Zitat Loh WY, Shih X (1999) Families of splitting criteria for classification trees. Stat Comput 9:309–315CrossRef Loh WY, Shih X (1999) Families of splitting criteria for classification trees. Stat Comput 9:309–315CrossRef
Zurück zum Zitat Maimon O, Rokach L (2001) Data mining by attribute decomposition with semiconductor manufacturing case study. In: Braha D (ed) Data mining for design and manufacturing, pp 311–336 Maimon O, Rokach L (2001) Data mining by attribute decomposition with semiconductor manufacturing case study. In: Braha D (ed) Data mining for design and manufacturing, pp 311–336
Zurück zum Zitat Margineantu D (2002) Class probability estimation and cost sensitive classification decisions. In: Proceedings of the 13th european conference on machine learning, 270–281 Margineantu D (2002) Class probability estimation and cost sensitive classification decisions. In: Proceedings of the 13th european conference on machine learning, 270–281
Zurück zum Zitat Margineantu D (2005) Active cost-sensitive learning. In Proceedings of the nineteenth international joint conference on artificial intelligence, IJCAI–05 Margineantu D (2005) Active cost-sensitive learning. In Proceedings of the nineteenth international joint conference on artificial intelligence, IJCAI–05
Zurück zum Zitat Nunez M (1991) The use of background knowledge in decision tree induction. Mach Learn 6(3):231–250MathSciNet Nunez M (1991) The use of background knowledge in decision tree induction. Mach Learn 6(3):231–250MathSciNet
Zurück zum Zitat Pazzani M, Merz C, Murphy P, Ali K, Hume T, Brunk C (1994) Reducing misclassification costs. In: Proceedings 11th international conference on machine learning. Morgan Kaufmann, pp 217–225 Pazzani M, Merz C, Murphy P, Ali K, Hume T, Brunk C (1994) Reducing misclassification costs. In: Proceedings 11th international conference on machine learning. Morgan Kaufmann, pp 217–225
Zurück zum Zitat Provost F, Fawcett T (1997) Analysis and visualization of classifier performance comparison under imprecise class and cost distribution. In: Proceedings of KDD-97. AAAI Press, pp 43–48 Provost F, Fawcett T (1997) Analysis and visualization of classifier performance comparison under imprecise class and cost distribution. In: Proceedings of KDD-97. AAAI Press, pp 43–48
Zurück zum Zitat Provost F, Fawcett T (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings 15th international conference on machine learning. Madison, pp 445–453 Provost F, Fawcett T (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings 15th international conference on machine learning. Madison, pp 445–453
Zurück zum Zitat Rokach L (2008) Mining manufacturing data using genetic algorithm-based feature set decomposition. Int J Intell Syst Tech Appl 4(1):57–78 Rokach L (2008) Mining manufacturing data using genetic algorithm-based feature set decomposition. Int J Intell Syst Tech Appl 4(1):57–78
Zurück zum Zitat Rothaermel FT, Deeds DL (2004) Exploration and exploitation alliances in biotechnology: a system of new product development. Strateg Manage J 25(3):201–217CrossRef Rothaermel FT, Deeds DL (2004) Exploration and exploitation alliances in biotechnology: a system of new product development. Strateg Manage J 25(3):201–217CrossRef
Zurück zum Zitat Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the international conference on machine learning Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the international conference on machine learning
Zurück zum Zitat Saar-Tsechansky M, Provost F (2007) Decision-centric active learning of binary-outcome models. Inform Syst Res 18(1):4–22CrossRef Saar-Tsechansky M, Provost F (2007) Decision-centric active learning of binary-outcome models. Inform Syst Res 18(1):4–22CrossRef
Zurück zum Zitat Silberschatz A, Tuzhilin A (1995) On subjective measures of interestingness in knowledge discovery. In Proceedings, first international conference knowledge discovery and data mining, pp 275–281 Silberschatz A, Tuzhilin A (1995) On subjective measures of interestingness in knowledge discovery. In Proceedings, first international conference knowledge discovery and data mining, pp 275–281
Zurück zum Zitat Silberschatz A, Tuzhilin A (1996) What makes patterns interesting in knowledge discovery systems, IEEE Trans. Know Data Eng 8:970–974CrossRef Silberschatz A, Tuzhilin A (1996) What makes patterns interesting in knowledge discovery systems, IEEE Trans. Know Data Eng 8:970–974CrossRef
Zurück zum Zitat Turney P (1995) Cost-sensitive classification: empirical evaluation of hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409 Turney P (1995) Cost-sensitive classification: empirical evaluation of hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409
Zurück zum Zitat Turney P (2000) Types of cost in inductive concept learning In Proceedings of the ICML’2000. Workshop on cost sensitive learning, Stanford, pp 15–21 Turney P (2000) Types of cost in inductive concept learning In Proceedings of the ICML’2000. Workshop on cost sensitive learning, Stanford, pp 15–21
Zurück zum Zitat Viaene S, Baesens B, Van Gestel T, Suykens JAK, Van den Poel D, Vanthienen J, De Moor B, Dedene G (2001) Knowledge discovery in a direct marketing case using least squares support vector machine classifiers. Int J Intell Syst 9:1023–1036CrossRef Viaene S, Baesens B, Van Gestel T, Suykens JAK, Van den Poel D, Vanthienen J, De Moor B, Dedene G (2001) Knowledge discovery in a direct marketing case using least squares support vector machine classifiers. Int J Intell Syst 9:1023–1036CrossRef
Zurück zum Zitat Yinghui Y (2004) New data mining and marketing approaches for customer segmentation and promotion planning on the Internet, Phd Dissertation, University of Pennsylvania, ISBN 0-496-73213–1 Yinghui Y (2004) New data mining and marketing approaches for customer segmentation and promotion planning on the Internet, Phd Dissertation, University of Pennsylvania, ISBN 0-496-73213–1
Zurück zum Zitat Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the seventh international conference on knowledge discovery and data mining (KDD’01) Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the seventh international conference on knowledge discovery and data mining (KDD’01)
Zurück zum Zitat Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In ICDM (2003), pp 435–442 Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In ICDM (2003), pp 435–442
Zurück zum Zitat Zadrozny B (2005) One-benefit learning: cost-sensitive learning with restricted cost information. In Proceedings of the workshop on utility-based data mining at the eleventh ACM SIGKDD international conference on knowledge discovery and data mining Zadrozny B (2005) One-benefit learning: cost-sensitive learning with restricted cost information. In Proceedings of the workshop on utility-based data mining at the eleventh ACM SIGKDD international conference on knowledge discovery and data mining
Zurück zum Zitat Zahavi J, Levin N (1997) Applying neural computing to target marketing. J Direct Mark 11(1):5–22CrossRef Zahavi J, Levin N (1997) Applying neural computing to target marketing. J Direct Mark 11(1):5–22CrossRef
Metadaten
Titel
Introduction to Proactive Data Mining
verfasst von
Haim Dahan
Shahar Cohen
Lior Rokach
Oded Maimon
Copyright-Jahr
2014
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4939-0539-3_1