Skip to main content
Erschienen in: Neural Computing and Applications 6/2017

07.01.2016 | Original Article

A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel features

verfasst von: Sérgio Moro, Paulo Cortez, Paulo Rita

Erschienen in: Neural Computing and Applications | Ausgabe 6/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The need to leverage knowledge through data mining has driven enterprises in a demand for more data. However, there is a gap between the availability of data and the application of extracted knowledge for improving decision support. In fact, more data do not necessarily imply better predictive data-driven marketing models, since it is often the case that the problem domain requires a deeper characterization. Aiming at such characterization, we propose a framework drawn on three feature selection strategies, where the goal is to unveil novel features that can effectively increase the value of data by providing a richer characterization of the problem domain. Such strategies involve encompassing context (e.g., social and economic variables), evaluating past history, and disaggregate the main problem into smaller but interesting subproblems. The framework is evaluated through an empirical analysis for a real bank telemarketing application, with the results proving the benefits of such approach, as the area under the receiver operating characteristic curve increased with each stage, improving previous model in terms of predictive performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lee N, Greenley G (2008) The primacy of data? Eur J Mark 42(11/12):1141–1144CrossRef Lee N, Greenley G (2008) The primacy of data? Eur J Mark 42(11/12):1141–1144CrossRef
2.
Zurück zum Zitat Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36(4):1165–1188 Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36(4):1165–1188
3.
Zurück zum Zitat Maklan S, Peppard J, Klaus P (2015) Show me the money: improving our understanding of how organizations generate return from technology-led marketing change. Eur J Mark 49(3/4):561–595CrossRef Maklan S, Peppard J, Klaus P (2015) Show me the money: improving our understanding of how organizations generate return from technology-led marketing change. Eur J Mark 49(3/4):561–595CrossRef
4.
Zurück zum Zitat Bucklin R, Lehmann D, Little J (1998) From decision support to decision automation: a 2020 vision. Mark Lett 9(3):235–246CrossRef Bucklin R, Lehmann D, Little J (1998) From decision support to decision automation: a 2020 vision. Mark Lett 9(3):235–246CrossRef
5.
Zurück zum Zitat Wang F-Y (2012) A big-data perspective on AI: Newton, Merton, and analytics intelligence. IEEE Intell Syst 27(5):2–4CrossRef Wang F-Y (2012) A big-data perspective on AI: Newton, Merton, and analytics intelligence. IEEE Intell Syst 27(5):2–4CrossRef
6.
Zurück zum Zitat Holbrook MB, Hulbert JM (2002) Elegy on the death of marketing: never send to know why we have come to bury marketing but ask what you can do for your country churchyard. Eur J Mark 36(5/6):706–732CrossRef Holbrook MB, Hulbert JM (2002) Elegy on the death of marketing: never send to know why we have come to bury marketing but ask what you can do for your country churchyard. Eur J Mark 36(5/6):706–732CrossRef
7.
Zurück zum Zitat Sohrabi B, Mahmoudian P, Raeesi I (2012) A framework for improving e-commerce websites usability using a hybrid genetic algorithm and neural network system. Neural Comput Appl 21(5):1017–1029CrossRef Sohrabi B, Mahmoudian P, Raeesi I (2012) A framework for improving e-commerce websites usability using a hybrid genetic algorithm and neural network system. Neural Comput Appl 21(5):1017–1029CrossRef
8.
Zurück zum Zitat Michalewicz Z, Schmidt M, Michalewicz M, Chiriac C (2005) Case study: an intelligent decision support system. IEEE Intell Syst 20(4):44–49CrossRef Michalewicz Z, Schmidt M, Michalewicz M, Chiriac C (2005) Case study: an intelligent decision support system. IEEE Intell Syst 20(4):44–49CrossRef
9.
Zurück zum Zitat Liu H, Dougherty E, Dy JG, Torkkola K, Tuv E, Peng H, Ding C, Long F, Berens M, Parsons L et al (2005) Evolving feature selection. IEEE Intell Syst 20(6):64–76CrossRef Liu H, Dougherty E, Dy JG, Torkkola K, Tuv E, Peng H, Ding C, Long F, Berens M, Parsons L et al (2005) Evolving feature selection. IEEE Intell Syst 20(6):64–76CrossRef
10.
Zurück zum Zitat Meiri R, Zahavi J (2006) Using simulated annealing to optimize the feature selection problem in marketing applications. Eur J Oper Res 171(3):842–858CrossRefMATH Meiri R, Zahavi J (2006) Using simulated annealing to optimize the feature selection problem in marketing applications. Eur J Oper Res 171(3):842–858CrossRefMATH
11.
Zurück zum Zitat Neto MTR, de Souza JC, Souki GQ (2011) Identifying variables that predict clients’ propensity to end their checking accounts. Int J Bank Mark 29(6):489–507CrossRef Neto MTR, de Souza JC, Souki GQ (2011) Identifying variables that predict clients’ propensity to end their checking accounts. Int J Bank Mark 29(6):489–507CrossRef
12.
Zurück zum Zitat Saarenpaa J, Kolehmainen M, Mononen M, Niska H (2015) A data mining approach for producing small area statistics-based load profiles for distribution network planning. In: 2015 IEEE international conference on industrial technology (ICIT), IEEE. pp 1236–1240 Saarenpaa J, Kolehmainen M, Mononen M, Niska H (2015) A data mining approach for producing small area statistics-based load profiles for distribution network planning. In: 2015 IEEE international conference on industrial technology (ICIT), IEEE. pp 1236–1240
13.
Zurück zum Zitat Shaheen M, Shahbaz M, Guergachi A (2013) Context based positive and negative spatio-temporal association rule mining. Knowl Based Syst 37:261–273CrossRef Shaheen M, Shahbaz M, Guergachi A (2013) Context based positive and negative spatio-temporal association rule mining. Knowl Based Syst 37:261–273CrossRef
14.
Zurück zum Zitat Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31CrossRef Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31CrossRef
15.
Zurück zum Zitat Li G-D, Yamaguchi D, Nagai M (2008) The development of stock exchange simulation prediction modeling by a hybrid grey dynamic model. Int J Adv Manuf Technol 36(1–2):195–204CrossRef Li G-D, Yamaguchi D, Nagai M (2008) The development of stock exchange simulation prediction modeling by a hybrid grey dynamic model. Int J Adv Manuf Technol 36(1–2):195–204CrossRef
16.
Zurück zum Zitat Ching-Chin C, Ieng AIK, Ling-Ling W, Ling-Chieh K (2010) Designing a decision-support system for new product sales forecasting. Expert Syst Appl 37(2):1654–1665CrossRef Ching-Chin C, Ieng AIK, Ling-Ling W, Ling-Chieh K (2010) Designing a decision-support system for new product sales forecasting. Expert Syst Appl 37(2):1654–1665CrossRef
17.
Zurück zum Zitat Golmohammadi K, Zaiane OR (2012) Data mining applications for fraud detection in securities market. In: 2012 European intelligence and security informatics conference (EISIC), IEEE. pp 107–114 Golmohammadi K, Zaiane OR (2012) Data mining applications for fraud detection in securities market. In: 2012 European intelligence and security informatics conference (EISIC), IEEE. pp 107–114
18.
Zurück zum Zitat Miguéis VL, Camanho AS, Cunha JF (2012) Customer data mining for lifestyle segmentation. Expert Syst Appl 39(10):9359–9366CrossRef Miguéis VL, Camanho AS, Cunha JF (2012) Customer data mining for lifestyle segmentation. Expert Syst Appl 39(10):9359–9366CrossRef
19.
Zurück zum Zitat Fader PS, Hardie BGS, Lee KL (2005) RFM and CLV: using iso-value curves for customer base analysis. J Mark Res 42(4):415–430CrossRef Fader PS, Hardie BGS, Lee KL (2005) RFM and CLV: using iso-value curves for customer base analysis. J Mark Res 42(4):415–430CrossRef
20.
Zurück zum Zitat Cheng C-H, Chen Y-S (2009) Classifying the segmentation of customer value via RFM model and RS theory. Expert Syst Appl 36(3):4176–4184CrossRef Cheng C-H, Chen Y-S (2009) Classifying the segmentation of customer value via RFM model and RS theory. Expert Syst Appl 36(3):4176–4184CrossRef
21.
Zurück zum Zitat Yeh I-C, Yang K-J, Ting T-M (2009) Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst Appl 36(3):5866–5871CrossRef Yeh I-C, Yang K-J, Ting T-M (2009) Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst Appl 36(3):5866–5871CrossRef
22.
Zurück zum Zitat Quinlan JR (1996) Learning decision tree classifiers. ACM Comput Surv (CSUR) 28(1):71–72CrossRef Quinlan JR (1996) Learning decision tree classifiers. ACM Comput Surv (CSUR) 28(1):71–72CrossRef
23.
Zurück zum Zitat Liu Y, Schumann M (2005) Data mining feature selection for credit scoring models. J Oper Res Soc 56(9):1099–1108CrossRefMATH Liu Y, Schumann M (2005) Data mining feature selection for credit scoring models. J Oper Res Soc 56(9):1099–1108CrossRefMATH
24.
Zurück zum Zitat Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186CrossRef Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186CrossRef
25.
Zurück zum Zitat Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553CrossRef Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553CrossRef
26.
Zurück zum Zitat Kewley RH, Embrechts MJ, Breneman C (2000) Data strip mining for the virtual design of pharmaceuticals with neural networks. IEEE Trans Neural Netw 11(3):668–679CrossRef Kewley RH, Embrechts MJ, Breneman C (2000) Data strip mining for the virtual design of pharmaceuticals with neural networks. IEEE Trans Neural Netw 11(3):668–679CrossRef
27.
Zurück zum Zitat Cortez P, Embrechts MJ (2011) Opening black box data mining models using sensitivity analysis. In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), IEEE. pp 341–348 Cortez P, Embrechts MJ (2011) Opening black box data mining models using sensitivity analysis. In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), IEEE. pp 341–348
28.
Zurück zum Zitat Cortez P, Embrechts MJ (2013) Using sensitivity analysis and visualization techniques to open black box data mining models. Inf Sci 225:1–17CrossRef Cortez P, Embrechts MJ (2013) Using sensitivity analysis and visualization techniques to open black box data mining models. Inf Sci 225:1–17CrossRef
29.
Zurück zum Zitat Moro S, Cortez P, Rita P (2015) Using customer lifetime value and neural networks to improve the prediction of bank deposit subscription in telemarketing campaigns. Neural Comput Appl 26(1):131–139CrossRef Moro S, Cortez P, Rita P (2015) Using customer lifetime value and neural networks to improve the prediction of bank deposit subscription in telemarketing campaigns. Neural Comput Appl 26(1):131–139CrossRef
30.
Zurück zum Zitat Cortez P (2010) Data mining with neural networks and support vector machines using the R/rminer tool. In: Perner P (ed) Advances in data mining - applications and theoretical aspects, 10th Industrial Conference on Data Mining (ICDM 2010), LNAI 6171. Springer, pp 572–583 Cortez P (2010) Data mining with neural networks and support vector machines using the R/rminer tool. In: Perner P (ed) Advances in data mining - applications and theoretical aspects, 10th Industrial Conference on Data Mining (ICDM 2010), LNAI 6171. Springer, pp 572–583
31.
Zurück zum Zitat Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New YorkMATH Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New YorkMATH
34.
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
35.
Zurück zum Zitat Tang L, Thomas LC, Thomas S, Bozzetto JF (2007) It’s the economy stupid: modelling financial product purchases. Int J Bank Mark 25(1):22–38CrossRef Tang L, Thomas LC, Thomas S, Bozzetto JF (2007) It’s the economy stupid: modelling financial product purchases. Int J Bank Mark 25(1):22–38CrossRef
36.
Zurück zum Zitat Chen C-C (2014) Rfid-based intelligent shopping environment: a comprehensive evaluation framework with neural computing approach. Neural Comput Appl 25(7–8):1685–1697CrossRef Chen C-C (2014) Rfid-based intelligent shopping environment: a comprehensive evaluation framework with neural computing approach. Neural Comput Appl 25(7–8):1685–1697CrossRef
37.
Zurück zum Zitat Coussement K, Van den Bossche FAM, De Bock KW (2014) Data accuracy’s impact on segmentation performance: benchmarking RFM analysis, logistic regression, and decision trees. J Bus Res 67(1):2751–2758CrossRef Coussement K, Van den Bossche FAM, De Bock KW (2014) Data accuracy’s impact on segmentation performance: benchmarking RFM analysis, logistic regression, and decision trees. J Bus Res 67(1):2751–2758CrossRef
Metadaten
Titel
A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel features
verfasst von
Sérgio Moro
Paulo Cortez
Paulo Rita
Publikationsdatum
07.01.2016
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 6/2017
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-015-2157-8

Weitere Artikel der Ausgabe 6/2017

Neural Computing and Applications 6/2017 Zur Ausgabe