Skip to main content
Erschienen in: Soft Computing 24/2019

21.02.2019 | Methodologies and Application

Feature–granularity selection with variable costs for hybrid data

verfasst von: Shujiao Liao, Qingxin Zhu, Yuhua Qian

Erschienen in: Soft Computing | Ausgabe 24/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, cost-sensitive feature selection has drawn much attention. However, some issues still remain to be investigated. Particularly, most existing work deals with single-typed data, while only a few studies deal with hybrid data; moreover, both the test cost of a feature and the misclassification cost of an object are often assumed to be fixed, but in fact they are usually variable with the error range of the data, or equivalently the data granularity. In view of these facts, a feature–granularity selection approach is proposed to select the optimal feature subset and the optimal data granularity simultaneously to minimize the total cost for processing hybrid data. In the approach, firstly an adaptive neighborhood model is constructed, in which the neighborhood granules are generated adaptively according to the types of features. Then, multiple kinds of variable cost setting are discussed according to reality, and finally, an optimal feature–granularity selection algorithm is designed. Experimental results on sixteen UCI datasets show that a good trade-off among feature dimension reduction, data granularity selection and total cost minimization could be achieved by the proposed algorithm. In particular, the influences of different cost settings to the feature–granularity selection are also discussed thoroughly in the paper, which would provide some feasible schemes for decision making.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ansorge S, Schmidt J (2015) Visualized episode mining with feature granularity selection. In: Industrial conference on data mining. Springer, Cham, pp 201–215 Ansorge S, Schmidt J (2015) Visualized episode mining with feature granularity selection. In: Industrial conference on data mining. Springer, Cham, pp 201–215
Zurück zum Zitat Bian J, Peng XG, Wang Y, Zhang H (2016) An efficient cost-sensitive feature selection using chaos genetic algorithm for class imbalance problem. Math Probl Eng 2016:1–9 Bian J, Peng XG, Wang Y, Zhang H (2016) An efficient cost-sensitive feature selection using chaos genetic algorithm for class imbalance problem. Math Probl Eng 2016:1–9
Zurück zum Zitat Boussouf M, Quafafou M (2000) Scalable feature selection using rough set theory. In: Proceedings of rough sets and current trends in computing, vol. 2005. LNCS, pp 131–138 Boussouf M, Quafafou M (2000) Scalable feature selection using rough set theory. In: Proceedings of rough sets and current trends in computing, vol. 2005. LNCS, pp 131–138
Zurück zum Zitat Cao P, Zhao DZ, Zaiane O (2013) An optimized cost-sensitive SVM for imbalanced data learning. In: Advances in knowledge discovery and data mining, vol 7819. LNCS, pp 280–292 Cao P, Zhao DZ, Zaiane O (2013) An optimized cost-sensitive SVM for imbalanced data learning. In: Advances in knowledge discovery and data mining, vol 7819. LNCS, pp 280–292
Zurück zum Zitat Chai XY, Deng L, Yang Q, Ling CX (2004) Test-cost sensitive Naïve Bayes classification. In: Proceedings of the 5th international conference on data mining, pp 51–58 Chai XY, Deng L, Yang Q, Ling CX (2004) Test-cost sensitive Naïve Bayes classification. In: Proceedings of the 5th international conference on data mining, pp 51–58
Zurück zum Zitat Chen DG, Yang YY (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334MathSciNetCrossRef Chen DG, Yang YY (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334MathSciNetCrossRef
Zurück zum Zitat Dai JH, Wang WT, Xu Q, Tian HW (2012) Uncertainty measurement for interval-valued decision systems based on extended conditional entropy. Knowl Based Syst 27:443–450CrossRef Dai JH, Wang WT, Xu Q, Tian HW (2012) Uncertainty measurement for interval-valued decision systems based on extended conditional entropy. Knowl Based Syst 27:443–450CrossRef
Zurück zum Zitat Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the 5th international conference on knowledge discovery and data mining, pp 155–164 Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the 5th international conference on knowledge discovery and data mining, pp 155–164
Zurück zum Zitat Doquire G, Verleysen M (2011) An hybrid approach to feature selection for mixed categorical and continuous data. In: Proceedings of the international conference on knowledge discovery and information retrieval, pp 394–401 Doquire G, Verleysen M (2011) An hybrid approach to feature selection for mixed categorical and continuous data. In: Proceedings of the international conference on knowledge discovery and information retrieval, pp 394–401
Zurück zum Zitat Du J, Cai ZH, Ling CX (2007) Cost-sensitive decision trees with pre-pruning. In: Proceedings of Canadian AI, No. 4509. LNAI, pp 171–179 Du J, Cai ZH, Ling CX (2007) Cost-sensitive decision trees with pre-pruning. In: Proceedings of Canadian AI, No. 4509. LNAI, pp 171–179
Zurück zum Zitat Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc Lond Ser A Contain Pap Math Phys Charact 222:309–368CrossRef Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc Lond Ser A Contain Pap Math Phys Charact 222:309–368CrossRef
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
Zurück zum Zitat Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594MathSciNetCrossRef Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594MathSciNetCrossRef
Zurück zum Zitat Hu QH, Pedrycz W, Yu DR, Lang J (2010) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man Cybern Part B Cybern 40(1):137–150CrossRef Hu QH, Pedrycz W, Yu DR, Lang J (2010) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man Cybern Part B Cybern 40(1):137–150CrossRef
Zurück zum Zitat Huang TY, Zhu W (2017) Cost-sensitive feature selection via manifold learning. J Shandong Univ 52(3):91–96MATH Huang TY, Zhu W (2017) Cost-sensitive feature selection via manifold learning. J Shandong Univ 52(3):91–96MATH
Zurück zum Zitat Iswandy K, Koenig A (2006) Feature selection with acquisition cost for optimizing sensor system design. Adv Radio Sci 4:135–141CrossRef Iswandy K, Koenig A (2006) Feature selection with acquisition cost for optimizing sensor system design. Adv Radio Sci 4:135–141CrossRef
Zurück zum Zitat Jia XY, Liao WH, Tang ZM, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167MathSciNetCrossRef Jia XY, Liao WH, Tang ZM, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167MathSciNetCrossRef
Zurück zum Zitat Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl Based Syst 23:580–585CrossRef Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl Based Syst 23:580–585CrossRef
Zurück zum Zitat Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53:912–926MathSciNetCrossRef Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53:912–926MathSciNetCrossRef
Zurück zum Zitat Liao SJ, Zhu QX, Min F (2014) Cost-sensitive attribute reduction in decision-theoretic rough set models. Math Probl Eng 2014:1–9MathSciNetMATH Liao SJ, Zhu QX, Min F (2014) Cost-sensitive attribute reduction in decision-theoretic rough set models. Math Probl Eng 2014:1–9MathSciNetMATH
Zurück zum Zitat Liao SJ, Zhu QX, Liang R (2017) An efficient approach of test-cost-sensitive attribute reduction for numerical data. Int J Innov Comput Inf Control 13(6):2099–2111 Liao SJ, Zhu QX, Liang R (2017) An efficient approach of test-cost-sensitive attribute reduction for numerical data. Int J Innov Comput Inf Control 13(6):2099–2111
Zurück zum Zitat Liao SJ, Zhu QX, Qian YH, Lin GP (2018) Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs. Knowl Based Syst 158:25–42CrossRef Liao SJ, Zhu QX, Qian YH, Lin GP (2018) Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs. Knowl Based Syst 158:25–42CrossRef
Zurück zum Zitat Liu GL, Sai Y (2009) A comparison of two types of rough sets induced by coverings. Int J Approx Reason 50(3):521–528MathSciNetCrossRef Liu GL, Sai Y (2009) A comparison of two types of rough sets induced by coverings. Int J Approx Reason 50(3):521–528MathSciNetCrossRef
Zurück zum Zitat Luo C, Li TR, Chen HM, Lu LX (2015) Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values. Inf Sci 299:221–242MathSciNetCrossRef Luo C, Li TR, Chen HM, Lu LX (2015) Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values. Inf Sci 299:221–242MathSciNetCrossRef
Zurück zum Zitat Min F, He HP, Qian YH, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181:4928–4942CrossRef Min F, He HP, Qian YH, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181:4928–4942CrossRef
Zurück zum Zitat Pendharkar PC (2013) A maximum-margin genetic algorithm for misclassification cost minimizing feature selection problem. Expert Syst Appl 40(10):3918–3925CrossRef Pendharkar PC (2013) A maximum-margin genetic algorithm for misclassification cost minimizing feature selection problem. Expert Syst Appl 40(10):3918–3925CrossRef
Zurück zum Zitat Shu WH, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280CrossRef Shu WH, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280CrossRef
Zurück zum Zitat Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7 Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7
Zurück zum Zitat Wang T, Qin ZX, Jin Z, Zhang S (2010) Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning. J Syst Softw 83(7):1137–1147CrossRef Wang T, Qin ZX, Jin Z, Zhang S (2010) Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning. J Syst Softw 83(7):1137–1147CrossRef
Zurück zum Zitat Weiss Y, Elovici Y, Rokach L (2013) The cash algorithm-cost-sensitive attribute selection using histograms. Inf Sci 222:247–268MathSciNetCrossRef Weiss Y, Elovici Y, Rokach L (2013) The cash algorithm-cost-sensitive attribute selection using histograms. Inf Sci 222:247–268MathSciNetCrossRef
Zurück zum Zitat Yao YY (2004) A partition model of granular computing. Lect Notes Comput Sci 3100:232–253CrossRef Yao YY (2004) A partition model of granular computing. Lect Notes Comput Sci 3100:232–253CrossRef
Zurück zum Zitat Yao YY, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373MathSciNetCrossRef Yao YY, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373MathSciNetCrossRef
Zurück zum Zitat Yu SL, Zhao H (2018) Rough sets and Laplacian score based cost sensitive feature selection. PLoS ONE 13(6):1–23 Yu SL, Zhao H (2018) Rough sets and Laplacian score based cost sensitive feature selection. PLoS ONE 13(6):1–23
Zurück zum Zitat Zhang SC, Liu L, Zhu XF, Zhang C (2008) A strategy for attributes selection in cost-sensitive decision trees induction. In: IEEE 8th international conference on computer and information technology workshops, Sydney, QLD, pp 8–13 Zhang SC, Liu L, Zhu XF, Zhang C (2008) A strategy for attributes selection in cost-sensitive decision trees induction. In: IEEE 8th international conference on computer and information technology workshops, Sydney, QLD, pp 8–13
Zurück zum Zitat Zhang Y, Gong DW, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinform 14(1):64–75CrossRef Zhang Y, Gong DW, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinform 14(1):64–75CrossRef
Zurück zum Zitat Zhao H, Yu SL (2019) Cost-sensitive feature selection via the \(l_{2,1}\)-norm. Int J Approx Reason 104:25–37 Zhao H, Yu SL (2019) Cost-sensitive feature selection via the \(l_{2,1}\)-norm. Int J Approx Reason 104:25–37
Zurück zum Zitat Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82CrossRef Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82CrossRef
Zurück zum Zitat Zhao H, Min F, Zhu W (2013) Cost-sensitive feature selection of numeric data with measurement errors. J Appl Math 2013:1–13MATH Zhao H, Min F, Zhu W (2013) Cost-sensitive feature selection of numeric data with measurement errors. J Appl Math 2013:1–13MATH
Zurück zum Zitat Zhou YH, Zhou ZH (2016) Large margin distirbution learning with cost interval and unlabeled data. IEEE Trans Knowl Data Eng 28(7):1749–1763CrossRef Zhou YH, Zhou ZH (2016) Large margin distirbution learning with cost interval and unlabeled data. IEEE Trans Knowl Data Eng 28(7):1749–1763CrossRef
Zurück zum Zitat Zhou QF, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl Based Syst 95:1–11CrossRef Zhou QF, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl Based Syst 95:1–11CrossRef
Metadaten
Titel
Feature–granularity selection with variable costs for hybrid data
verfasst von
Shujiao Liao
Qingxin Zhu
Yuhua Qian
Publikationsdatum
21.02.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 24/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-03854-2

Weitere Artikel der Ausgabe 24/2019

Soft Computing 24/2019 Zur Ausgabe