Skip to main content
Top
Published in: Information Systems and e-Business Management 3/2013

01-09-2013 | Original Article

An empirical study of cost-sensitive learning in cultural modeling

Authors: Peng Su, Wenji Mao, Daniel Zeng

Published in: Information Systems and e-Business Management | Issue 3/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Cultural modeling aims at developing behavioral models of groups and analyzing the impact of culture factors on group behavior using computational methods. Machine learning methods and in particular classification, play a central role in such applications. In modeling cultural data, it is expected that standard classifiers yield good performance under the assumption that different classification errors have uniform costs. However, this assumption is often violated in practice. Therefore, the performance of standard classifiers is severely hindered. To handle this problem, this paper empirically studies cost-sensitive learning in cultural modeling. We consider cost factor when building the classifiers, with the aim of minimizing total misclassification costs. We conduct experiments to investigate four typical cost-sensitive learning methods, combine them with six standard classifiers and evaluate their performance under various conditions. Our empirical study verifies the effectiveness of cost-sensitive learning in cultural modeling. Based on the experimental results, we gain a thorough insight into the problem of non-uniform misclassification costs, as well as the selection of cost-sensitive methods, base classifiers and method-classifier pairs for this domain. Furthermore, we propose an improved algorithm which outperforms the best method-classifier pair using the benchmark cultural datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmon Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmon
go back to reference Chawla N, Bowyer K, Hall L, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357 Chawla N, Bowyer K, Hall L, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
go back to reference Domingos P (1999) MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, pp 155–164 Domingos P (1999) MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, pp 155–164
go back to reference Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Working notes of the ICML 2003 workshop on learning from imbalanced data sets Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Working notes of the ICML 2003 workshop on learning from imbalanced data sets
go back to reference Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on artificial intelligence, pp 973–978 Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on artificial intelligence, pp 973–978
go back to reference Govindarajan M (2007) Text mining technique for data mining application. World Acad Sci Eng Technol 26(104):544–549 Govindarajan M (2007) Text mining technique for data mining application. World Acad Sci Eng Technol 26(104):544–549
go back to reference Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelli Data Anal 6(5):203–231 Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelli Data Anal 6(5):203–231
go back to reference Khuller S, Martinez V, Nau D, Simari G, Sliva A, Subrahmanian VS (2007) Finding most probable worlds of logic programs. In: Proceedings of the first international conference on scalable uncertainty management, pp 45–59 Khuller S, Martinez V, Nau D, Simari G, Sliva A, Subrahmanian VS (2007) Finding most probable worlds of logic programs. In: Proceedings of the first international conference on scalable uncertainty management, pp 45–59
go back to reference Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-One loss functions. In: Proceedings of the thirteenth international conference on machine learning, pp 275–283 Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-One loss functions. In: Proceedings of the thirteenth international conference on machine learning, pp 275–283
go back to reference Liu XY, Wu JX, Zhou ZX (2006) Exploratory undersampling for class-imbalance learning. In: Proceedings of the sixth IEEE international conference on data mining, pp 539–550 Liu XY, Wu JX, Zhou ZX (2006) Exploratory undersampling for class-imbalance learning. In: Proceedings of the sixth IEEE international conference on data mining, pp 539–550
go back to reference Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: Working notes of the ICML 2003 workshop on learning from imbalanced data sets Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: Working notes of the ICML 2003 workshop on learning from imbalanced data sets
go back to reference Mao WJ, Tuzhilin A, Gratch J (2011) Social and economic computing. IEEE Intell Syst 26(6):19–21CrossRef Mao WJ, Tuzhilin A, Gratch J (2011) Social and economic computing. IEEE Intell Syst 26(6):19–21CrossRef
go back to reference Martinez V, Simari GI, Sliva A, Subrahmanian VS (2007) CONVEX: context vectors as a paradigm for learning group behaviors based on similarity. IEEE Intell Syst 23(4):51–57CrossRef Martinez V, Simari GI, Sliva A, Subrahmanian VS (2007) CONVEX: context vectors as a paradigm for learning group behaviors based on similarity. IEEE Intell Syst 23(4):51–57CrossRef
go back to reference Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the fifteenth international conference on machine learning, pp 445–453 Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the fifteenth international conference on machine learning, pp 445–453
go back to reference Sarker RA, Abbass HA, Newton C (2002) Heuristics and optimization for knowledge discovery. Idea Group Inc, Naperville Sarker RA, Abbass HA, Newton C (2002) Heuristics and optimization for knowledge discovery. Idea Group Inc, Naperville
go back to reference Su P, Mao W, Zeng D, Li X, Wang FY (2009) Handling class imbalance problem in cultural modeling. In: Proceedings of the 2009 IEEE international conference on intelligence and security informatics, pp 251–256 Su P, Mao W, Zeng D, Li X, Wang FY (2009) Handling class imbalance problem in cultural modeling. In: Proceedings of the 2009 IEEE international conference on intelligence and security informatics, pp 251–256
go back to reference Subrahmanian VS (2007) Computer science: cultural modeling in real time. Science 317(5844):1509–1510CrossRef Subrahmanian VS (2007) Computer science: cultural modeling in real time. Science 317(5844):1509–1510CrossRef
go back to reference Subrahmanian VS, Albanese M, Martinez MV, Nau D, Reforgiato D, Simari GI, Sliva A, Wilkenfeld J, Udrea O (2007) CARA: a cultural-reasoning architecture. IEEE Intell Syst 22(2):12–16CrossRef Subrahmanian VS, Albanese M, Martinez MV, Nau D, Reforgiato D, Simari GI, Sliva A, Wilkenfeld J, Udrea O (2007) CARA: a cultural-reasoning architecture. IEEE Intell Syst 22(2):12–16CrossRef
go back to reference Ting KM (1998) Inducing cost-sensitive trees via instance weighting. In: Proceedings of the second european symposium on principles of data mining and knowledge discovery, pp 139–147 Ting KM (1998) Inducing cost-sensitive trees via instance weighting. In: Proceedings of the second european symposium on principles of data mining and knowledge discovery, pp 139–147
go back to reference Wang FY, Carley KM, Zeng D, Mao W (2007) Social computing: from social informatics to social intelligence. IEEE Intell Syst 22(2):79–83CrossRef Wang FY, Carley KM, Zeng D, Mao W (2007) Social computing: from social informatics to social intelligence. IEEE Intell Syst 22(2):79–83CrossRef
go back to reference Weiss GM (2004) Mining with rarity—problems and solutions: a unifying framework. SIGKDD Explor 6(1):7–19CrossRef Weiss GM (2004) Mining with rarity—problems and solutions: a unifying framework. SIGKDD Explor 6(1):7–19CrossRef
go back to reference Xia F, Yang YW, Zhou L, Li FX, Cai M, Zeng D (2009) A closed-form reduction of multi-class cost-sensitive learning to weighted multi-class learning. Pattern Recogn 42(7):1572–1581CrossRef Xia F, Yang YW, Zhou L, Li FX, Cai M, Zeng D (2009) A closed-form reduction of multi-class cost-sensitive learning to weighted multi-class learning. Pattern Recogn 42(7):1572–1581CrossRef
go back to reference Zeng D, Wang FY, Carley KM (2007) Social computing. IEEE Intell Syst 22(5):20–22CrossRef Zeng D, Wang FY, Carley KM (2007) Social computing. IEEE Intell Syst 22(5):20–22CrossRef
go back to reference Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the ICML’2003 workshop on learning from imbalanced data sets Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the ICML’2003 workshop on learning from imbalanced data sets
go back to reference Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77CrossRef Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77CrossRef
Metadata
Title
An empirical study of cost-sensitive learning in cultural modeling
Authors
Peng Su
Wenji Mao
Daniel Zeng
Publication date
01-09-2013
Publisher
Springer Berlin Heidelberg
Published in
Information Systems and e-Business Management / Issue 3/2013
Print ISSN: 1617-9846
Electronic ISSN: 1617-9854
DOI
https://doi.org/10.1007/s10257-012-0198-4

Other articles of this Issue 3/2013

Information Systems and e-Business Management 3/2013 Go to the issue