Skip to main content
Erschienen in: Soft Computing 5/2019

11.10.2017 | Methodologies and Application

Tri-partition cost-sensitive active learning through kNN

verfasst von: Fan Min, Fu-Lun Liu, Liu-Ying Wen, Zhi-Heng Zhang

Erschienen in: Soft Computing | Ausgabe 5/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Active learning differs from the training–testing scenario in that class labels can be obtained upon request. It is widely employed in applications where the labeling of instances incurs a heavy manual cost. In this paper, we propose a new algorithm called tri-partition active learning through k-nearest neighbors (TALK). The optimization objective is to minimize the total teacher and misclassification costs. First, a k-nearest neighbors classifier is employed to divide unlabeled instances into three disjoint regions. Region I contains instances for which the expected misclassification cost is lower than the teacher cost, Region II contains instances to be labeled by human experts, and Region III contains the remaining instances. Various strategies are designed to determine which instances are in Region II. Second, instances in Regions I and II are labeled and added to the training set, and the tri-partition process is repeated until all instances have been labeled. Experiments are undertaken on eight University of California, Irvine, datasets using different cost settings. Compared with the state-of-the-art cost-sensitive classification and active learning algorithms, our new algorithm generally exhibits a lower total cost.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Basu S (2010) Semi-supervised learning. J Roy Stat Soc 6493(10):2465–2472 Basu S (2010) Semi-supervised learning. J Roy Stat Soc 6493(10):2465–2472
Zurück zum Zitat Blake C, Merz CJ (1998) UCI repository of machine learning databases Blake C, Merz CJ (1998) UCI repository of machine learning databases
Zurück zum Zitat Bradford JP, Kunz C, Kohavi R, Brunk C, Brodley CE (2006) Pruning decision trees with misclassification costs. Lect Notes Comput Sci 51(1398):131–136 Bradford JP, Kunz C, Kohavi R, Brunk C, Brodley CE (2006) Pruning decision trees with misclassification costs. Lect Notes Comput Sci 51(1398):131–136
Zurück zum Zitat Brighton H, Mellish C (2001) Identifying competence-critical instances for instance-based learners. Springer 608:77–94 Brighton H, Mellish C (2001) Identifying competence-critical instances for instance-based learners. Springer 608:77–94
Zurück zum Zitat Cai D, He X (2012) Manifold adaptive experimental design for text categorization. IEEE Trans Knowl Data Eng 24(4):707–719CrossRef Cai D, He X (2012) Manifold adaptive experimental design for text categorization. IEEE Trans Knowl Data Eng 24(4):707–719CrossRef
Zurück zum Zitat Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In: International conference on machine learning, pp 208–215 Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In: International conference on machine learning, pp 208–215
Zurück zum Zitat Guo G, Wang H, Bell D, Bi Y, Greer K (2004) KNN model-based approach in classification. Springer, Berlin Guo G, Wang H, Bell D, Bi Y, Greer K (2004) KNN model-based approach in classification. Springer, Berlin
Zurück zum Zitat Harpale AS, Yang Y (2008) Personalized active learning for collaborative filtering. In: International ACM SIGIR conference on research and development in information retrieval, pp 91–98 Harpale AS, Yang Y (2008) Personalized active learning for collaborative filtering. In: International ACM SIGIR conference on research and development in information retrieval, pp 91–98
Zurück zum Zitat He YW, Zhang HR, Min F (2015) A teacher-cost-sensitive decision-theoretic rough set model. Springer, New YorkCrossRef He YW, Zhang HR, Min F (2015) A teacher-cost-sensitive decision-theoretic rough set model. Springer, New YorkCrossRef
Zurück zum Zitat Jin R, Si L (2004) A bayesian approach toward active learning for collaborative filtering, pp 278–285 Jin R, Si L (2004) A bayesian approach toward active learning for collaborative filtering, pp 278–285
Zurück zum Zitat Lesot MJ, Rifqi M, Benhadda H (2009) Similarity measures for binary and numerical data: a survey. Int J Knowl Eng Soft Data Paradig 1(1):63–84CrossRef Lesot MJ, Rifqi M, Benhadda H (2009) Similarity measures for binary and numerical data: a survey. Int J Knowl Eng Soft Data Paradig 1(1):63–84CrossRef
Zurück zum Zitat Li HX, Zhang LB, Huang B, Zhou XZ (2016) Sequential three-way decision and granulation for cost-sensitive face recognition. Knowl Based Syst 91:241–251CrossRef Li HX, Zhang LB, Huang B, Zhou XZ (2016) Sequential three-way decision and granulation for cost-sensitive face recognition. Knowl Based Syst 91:241–251CrossRef
Zurück zum Zitat Li JH, Ren Y, Mei CL, Qian YH, Yang XB (2016) A comparative study of multigranulation rough sets and concept lattices via rule acquisition. Knowl Based Syst 91:152–164CrossRef Li JH, Ren Y, Mei CL, Qian YH, Yang XB (2016) A comparative study of multigranulation rough sets and concept lattices via rule acquisition. Knowl Based Syst 91:152–164CrossRef
Zurück zum Zitat Li XN, Yi HJ, She YH, Sun BZ (2017) Generalized three-way decision models based on subset evaluation. Int J Approximate Reasoning 83:142–159MathSciNetMATHCrossRef Li XN, Yi HJ, She YH, Sun BZ (2017) Generalized three-way decision models based on subset evaluation. Int J Approximate Reasoning 83:142–159MathSciNetMATHCrossRef
Zurück zum Zitat Liu D, Li TR, Ruan D (2011) Probabilistic model criteria with decision-theoretic rough sets. Inf Sci 181:3709–3722MathSciNetCrossRef Liu D, Li TR, Ruan D (2011) Probabilistic model criteria with decision-theoretic rough sets. Inf Sci 181:3709–3722MathSciNetCrossRef
Zurück zum Zitat Liu D, Li TR, Liang DC (2014) Incorporating logistic regression to decision-theoretic rough sets for classifications. Int J Approx Reason 55:197–210MathSciNetMATHCrossRef Liu D, Li TR, Liang DC (2014) Incorporating logistic regression to decision-theoretic rough sets for classifications. Int J Approx Reason 55:197–210MathSciNetMATHCrossRef
Zurück zum Zitat Liu D, Liang D, Wang C (2016) A novel three-way decision model based on incomplete information system. Knowl-Based Syst 91:32–45CrossRef Liu D, Liang D, Wang C (2016) A novel three-way decision model based on incomplete information system. Knowl-Based Syst 91:32–45CrossRef
Zurück zum Zitat Long B, Bian J, Chapelle O, Zhang Y (2015) Active learning for ranking through expected loss optimization. IEEE Trans Knowl Data Eng 27(5):1180–1191CrossRef Long B, Bian J, Chapelle O, Zhang Y (2015) Active learning for ranking through expected loss optimization. IEEE Trans Knowl Data Eng 27(5):1180–1191CrossRef
Zurück zum Zitat Long B, Chapelle O, Zhang Y, Chang Y, Zheng Z, Tseng B (2010) Active learning for ranking through expected loss optimization. In: Proceeding of the international ACM SIGIR conference on research and development in information retrieval, SIGIR 2010, Geneva, Switzerland, pp 267–274 Long B, Chapelle O, Zhang Y, Chang Y, Zheng Z, Tseng B (2010) Active learning for ranking through expected loss optimization. In: Proceeding of the international ACM SIGIR conference on research and development in information retrieval, SIGIR 2010, Geneva, Switzerland, pp 267–274
Zurück zum Zitat Mccallum A, Nigam K (1998) Employing EM and pool-based active learning for text classification. In: Fifteenth international conference on machine learning, pp 350–358 Mccallum A, Nigam K (1998) Employing EM and pool-based active learning for text classification. In: Fifteenth international conference on machine learning, pp 350–358
Zurück zum Zitat Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106 Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Zurück zum Zitat Quinlan JR (2014) C.45: programs for machine learning. Elsevier, Amsterdam Quinlan JR (2014) C.45: programs for machine learning. Elsevier, Amsterdam
Zurück zum Zitat Rand GK (1979) Decision systems for inventory management and production planning. Wiley, New York Rand GK (1979) Decision systems for inventory management and production planning. Wiley, New York
Zurück zum Zitat Saartsechansky M, Provost F (2004) Active sampling for class probability estimation and ranking. Mach Learn 54(2):153–178MATHCrossRef Saartsechansky M, Provost F (2004) Active sampling for class probability estimation and ranking. Mach Learn 54(2):153–178MATHCrossRef
Zurück zum Zitat Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth workshop on computational learning theory, vol 284, pp 287–294 Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth workshop on computational learning theory, vol 284, pp 287–294
Zurück zum Zitat Sheng VS (2012) Studying active learning in the cost-sensitive framework. In: Hawaii international conference on system sciences, pp 1097–1106 Sheng VS (2012) Studying active learning in the cost-sensitive framework. In: Hawaii international conference on system sciences, pp 1097–1106
Zurück zum Zitat Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(1):45–66MATH Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2(1):45–66MATH
Zurück zum Zitat Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7 Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7
Zurück zum Zitat Wang M, Min F, Zhang ZH, Wu YX (2017) Active learning through density clustering. Expert Syst Appl 85:305–317CrossRef Wang M, Min F, Zhang ZH, Wu YX (2017) Active learning through density clustering. Expert Syst Appl 85:305–317CrossRef
Zurück zum Zitat Yao YY (2012) An outline of a theory of three-way decisions. In: International conference on rough sets and current trends in computing, Springer, New York, pp 1–17 Yao YY (2012) An outline of a theory of three-way decisions. In: International conference on rough sets and current trends in computing, Springer, New York, pp 1–17
Zurück zum Zitat Yao YY (2016) Three-way decisions and cognitive computing. Cognit Comput 8(4):543–554 Yao YY (2016) Three-way decisions and cognitive computing. Cognit Comput 8(4):543–554
Zurück zum Zitat Zhang HR, Min F, Shi B (2016) Regression-based three-way recommendation. Inf Sci Zhang HR, Min F, Shi B (2016) Regression-based three-way recommendation. Inf Sci
Zurück zum Zitat Zhang HR, Min F (2016) Three-way recommender systems based on random forests. Knowl Based Syst 91:275–286CrossRef Zhang HR, Min F (2016) Three-way recommender systems based on random forests. Knowl Based Syst 91:275–286CrossRef
Zurück zum Zitat Zhang BW, Min F, Ciucci D (2015) Representative-based classification through covering-based neighborhood rough sets. Appl Intell 43(4):840–854CrossRef Zhang BW, Min F, Ciucci D (2015) Representative-based classification through covering-based neighborhood rough sets. Appl Intell 43(4):840–854CrossRef
Zurück zum Zitat Zhang Y, Zhou ZH (2008) Cost-sensitive face recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–8 Zhang Y, Zhou ZH (2008) Cost-sensitive face recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Zurück zum Zitat Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82CrossRef Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82CrossRef
Zurück zum Zitat Zhao Y, Yao Y, Luo F (2007) Data analysis based on discernibility and indiscernibility. Inf Sci 177(22):4959–4976MATHCrossRef Zhao Y, Yao Y, Luo F (2007) Data analysis based on discernibility and indiscernibility. Inf Sci 177(22):4959–4976MATHCrossRef
Zurück zum Zitat Zhao H, Wang P, Hu QH (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149MathSciNetCrossRef Zhao H, Wang P, Hu QH (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149MathSciNetCrossRef
Zurück zum Zitat Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77MathSciNetCrossRef Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77MathSciNetCrossRef
Zurück zum Zitat Zhou B, Yao Y, Luo J (2014) Cost-sensitive three-way email spam filtering. J Intell Inf Syst 42(1):19–45CrossRef Zhou B, Yao Y, Luo J (2014) Cost-sensitive three-way email spam filtering. J Intell Inf Syst 42(1):19–45CrossRef
Zurück zum Zitat Zhu XQ, Wu XD (2005) Cost-constrained data acquisition for intelligent data preparation. IEEE Trans Knowl Data Eng 17(11):1542–1556CrossRef Zhu XQ, Wu XD (2005) Cost-constrained data acquisition for intelligent data preparation. IEEE Trans Knowl Data Eng 17(11):1542–1556CrossRef
Metadaten
Titel
Tri-partition cost-sensitive active learning through kNN
verfasst von
Fan Min
Fu-Lun Liu
Liu-Ying Wen
Zhi-Heng Zhang
Publikationsdatum
11.10.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 5/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-017-2879-x

Weitere Artikel der Ausgabe 5/2019

Soft Computing 5/2019 Zur Ausgabe