Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 10/2019

08.04.2019 | Original Article

Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions

verfasst von: Der-Chiang Li, Qi-Shi Shi, Hung-Yu Chen

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 10/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using \(\alpha\)-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Ali SS, Howlader T, Rahman SMM (2018) Pooled shrinkage estimator for quadratic discriminant classifier: an analysis for small sample sizes in face recognition. Int J Mach Learn Cybern 9(3):507–522CrossRef Ali SS, Howlader T, Rahman SMM (2018) Pooled shrinkage estimator for quadratic discriminant classifier: an analysis for small sample sizes in face recognition. Int J Mach Learn Cybern 9(3):507–522CrossRef
2.
Zurück zum Zitat Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATH
3.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Philip Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef Chawla NV, Bowyer KW, Hall LO, Philip Kegelmeyer W (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef
4.
Zurück zum Zitat Conroy B, Eshelman L, Potes C, Xu-Wilson M (2016) A dynamic ensemble approach to robust classification in the presence of missing data. Mach Learn 102(3):443–463MathSciNetMATHCrossRef Conroy B, Eshelman L, Potes C, Xu-Wilson M (2016) A dynamic ensemble approach to robust classification in the presence of missing data. Mach Learn 102(3):443–463MathSciNetMATHCrossRef
5.
Zurück zum Zitat Cost S, Salzberg S (1993) A weighted nearest neighbor algorithm for learning with symbolic features. Mach Learn 10(1):57–78 Cost S, Salzberg S (1993) A weighted nearest neighbor algorithm for learning with symbolic features. Mach Learn 10(1):57–78
6.
Zurück zum Zitat de Jesús Rubio J (2018) Error convergence analysis of the SUFIN and CSUFIN. Appl Soft Comput 72:587–595CrossRef de Jesús Rubio J (2018) Error convergence analysis of the SUFIN and CSUFIN. Appl Soft Comput 72:587–595CrossRef
8.
Zurück zum Zitat Fard MJ, Wang P, Chawla S, Reddy CK (2016) A bayesian perspective on early stage event prediction in longitudinal data. IEEE Trans Knowl Data Eng 28:3126–3139CrossRef Fard MJ, Wang P, Chawla S, Reddy CK (2016) A bayesian perspective on early stage event prediction in longitudinal data. IEEE Trans Knowl Data Eng 28:3126–3139CrossRef
10.
Zurück zum Zitat Gui L, Xu RF, Lu Q, Du JC, Zhou Y (2018) Negative transfer detection in transductive transfer learning. Int J Mach Learn Cybern 9(2):185–197CrossRef Gui L, Xu RF, Lu Q, Du JC, Zhou Y (2018) Negative transfer detection in transductive transfer learning. Int J Mach Learn Cybern 9(2):185–197CrossRef
12.
13.
14.
Zurück zum Zitat Li DC, Lin WK, Chen CC, Chen HY, Lin LS (2018) Rebuilding sample distributions for small dataset learning. Decis Support Syst 105:66–76CrossRef Li DC, Lin WK, Chen CC, Chen HY, Lin LS (2018) Rebuilding sample distributions for small dataset learning. Decis Support Syst 105:66–76CrossRef
15.
Zurück zum Zitat Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34(4):966–982MATHCrossRef Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34(4):966–982MATHCrossRef
16.
Zurück zum Zitat Meza AG, Cortes TH, Lopez AV, Carranza LA, Herrera RT, Ramirez IO, Campana JA (2017) Analysis of fuzzy observability property for a class of TS fuzzy models. IEEE Latin Am Trans 15(4):595–602CrossRef Meza AG, Cortes TH, Lopez AV, Carranza LA, Herrera RT, Ramirez IO, Campana JA (2017) Analysis of fuzzy observability property for a class of TS fuzzy models. IEEE Latin Am Trans 15(4):595–602CrossRef
17.
Zurück zum Zitat Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86(11):2196–2209CrossRef Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86(11):2196–2209CrossRef
18.
Zurück zum Zitat Pan SJ, Yang QA (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef Pan SJ, Yang QA (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef
19.
Zurück zum Zitat Sezer EA, Nefeslioglu HA, Gokceoglu C (2014) An assessment on producing synthetic samples by fuzzy C-means for limited number of data in prediction models. Appl Soft Comput 24:126–134CrossRef Sezer EA, Nefeslioglu HA, Gokceoglu C (2014) An assessment on producing synthetic samples by fuzzy C-means for limited number of data in prediction models. Appl Soft Comput 24:126–134CrossRef
20.
Zurück zum Zitat Sharma A, Paliwal K (2015) Linear discriminant analysis for the small sample size problem: an overview. Int J Mach Learn Cybern 6(3):443–454CrossRef Sharma A, Paliwal K (2015) Linear discriminant analysis for the small sample size problem: an overview. Int J Mach Learn Cybern 6(3):443–454CrossRef
21.
Zurück zum Zitat Sinn HW (1980) A rehabilitation of the principle of insufficient reason. Q J Econ 94(3):493–506CrossRef Sinn HW (1980) A rehabilitation of the principle of insufficient reason. Q J Econ 94(3):493–506CrossRef
22.
Zurück zum Zitat Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203CrossRef Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203CrossRef
23.
Zurück zum Zitat Song X, Shao C, Yang X, Wu X (2017) Sparse representation-based classification using generalized weighted extended dictionary. Soft Comput 21(15):4335–4348CrossRef Song X, Shao C, Yang X, Wu X (2017) Sparse representation-based classification using generalized weighted extended dictionary. Soft Comput 21(15):4335–4348CrossRef
24.
Zurück zum Zitat van de Schoot R, Broere JJ, Perryck KH, Zondervan-Zwijnenburg M, Van Loey NE (2015) Analyzing small data sets using Bayesian estimation: the case of posttraumatic stress symptoms following mechanical ventilation in burn survivors. Eur J Psychotraumatol 6(1):25216CrossRef van de Schoot R, Broere JJ, Perryck KH, Zondervan-Zwijnenburg M, Van Loey NE (2015) Analyzing small data sets using Bayesian estimation: the case of posttraumatic stress symptoms following mechanical ventilation in burn survivors. Eur J Psychotraumatol 6(1):25216CrossRef
25.
Zurück zum Zitat Wang XZ, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715MathSciNetCrossRef Wang XZ, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715MathSciNetCrossRef
26.
Zurück zum Zitat Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef
27.
Zurück zum Zitat Wang Y, Witten IH (1997) Inducing model trees for continuous classes. In: Proceedings of the ninth european conference on machine learning, pp128–37 Wang Y, Witten IH (1997) Inducing model trees for continuous classes. In: Proceedings of the ninth european conference on machine learning, pp128–37
Metadaten
Titel
Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions
verfasst von
Der-Chiang Li
Qi-Shi Shi
Hung-Yu Chen
Publikationsdatum
08.04.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 10/2019
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-018-00905-2

Weitere Artikel der Ausgabe 10/2019

International Journal of Machine Learning and Cybernetics 10/2019 Zur Ausgabe

Neuer Inhalt