Skip to main content
Top
Published in: Soft Computing 8/2021

27-02-2021 | Methodologies and Application

Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system

Authors: Zhong-Sheng Chen, Qun-Xiong Zhu, Yuan Xu, Yan-Lin He, Qing-Lin Su, Yiqing C. Liu, Zoltan K. Nagy

Published in: Soft Computing | Issue 8/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Small sample size (SSS) problems pose a tremendous challenge in modeling tasks due to insufficient training samples, especially in process industry where thousands of useless samples overwhelm very limited valuable samples, leading to deterioration on the prediction ability of trained models for key variables. In this study, the prediction ability to forecast models is enhanced by generating virtual samples. Considering the integrated effects of attributes, a new data augment approach, called ITNN-VSG, which integrates virtual sample generation (VSG) with input-training neural network (ITNN), was put forward to enlarge training datasets for improving the performance of forecasting models. In the absence of any available domain-specific knowledge about target models, a query-driven interpolation process was first developed to explore the overall tendency of data distribution in both sparse regions and dense regions. Second, an ITNN with fixed weights was used to calculate the input corresponding to the virtual output generated by the interpolation process. To validate the effectiveness of the proposed approach, several in silico experiments were carried out on a benchmark dataset from sinc(x) function, followed by a real-world application to purified terephthalic acid (PTA) solvent system. The experimental results demonstrated that the proposed approach outperformed other existing approaches such as mega-trend-diffusion and tree-based-trend-diffusion.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bayar B, Bouaynaya N, Shterenberg R (2017) SMURC: high-dimension small-sample multivariate regression with covariance estimation. IEEE J Biomed Health Inform 21:573–581CrossRef Bayar B, Bouaynaya N, Shterenberg R (2017) SMURC: high-dimension small-sample multivariate regression with covariance estimation. IEEE J Biomed Health Inform 21:573–581CrossRef
go back to reference Blaes S, Burwick T (2017) Few-shot learning in deep networks through global prototyping. Neural Netw 94:159–172CrossRef Blaes S, Burwick T (2017) Few-shot learning in deep networks through global prototyping. Neural Netw 94:159–172CrossRef
go back to reference Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49:1–50CrossRef Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49:1–50CrossRef
go back to reference Chen J (2018) The quadrilateral Mindlin plate elements using the spline interpolation bases. J Comput Appl Math 329:68–83MathSciNetCrossRef Chen J (2018) The quadrilateral Mindlin plate elements using the spline interpolation bases. J Comput Appl Math 329:68–83MathSciNetCrossRef
go back to reference Chen ZS, Zhu B, He YL, Yu LA (2017) A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Eng Appl Artif Intell 59:236–243CrossRef Chen ZS, Zhu B, He YL, Yu LA (2017) A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Eng Appl Artif Intell 59:236–243CrossRef
go back to reference Dias LS, Ierapetritou MG (2016) Integration of scheduling and control under uncertainties: review and challenges. Chem Eng Res Des 116:98–113CrossRef Dias LS, Ierapetritou MG (2016) Integration of scheduling and control under uncertainties: review and challenges. Chem Eng Res Des 116:98–113CrossRef
go back to reference Diez-Olivan A, Del Ser J, Galar D, Sierra B (2019) Data fusion and machine learning for industrial prognosis: trends and perspectives towards Industry 4.0. Inf Fus 50:92–111CrossRef Diez-Olivan A, Del Ser J, Galar D, Sierra B (2019) Data fusion and machine learning for industrial prognosis: trends and perspectives towards Industry 4.0. Inf Fus 50:92–111CrossRef
go back to reference Espezua S, Villanueva E, Maciel CD, Carvalho A (2015) A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing 149:767–776CrossRef Espezua S, Villanueva E, Maciel CD, Carvalho A (2015) A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing 149:767–776CrossRef
go back to reference Gong HF, Chen ZS, Zhu QX, He YL (2017) A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: an empirical study of petrochemical industries. Appl Energy 197:405–415CrossRef Gong HF, Chen ZS, Zhu QX, He YL (2017) A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: an empirical study of petrochemical industries. Appl Energy 197:405–415CrossRef
go back to reference He YL, Wang PJ, Zhang MQ, Zhu QX, Xu Y (2018) A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: a case study of Ethylene industry. Energy 147:418–427CrossRef He YL, Wang PJ, Zhang MQ, Zhu QX, Xu Y (2018) A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: a case study of Ethylene industry. Energy 147:418–427CrossRef
go back to reference Hong SH, Wang L, Truong TK (2018) Low-complexity direct computation algorithm for cubic-spline interpolation scheme. J Vis Commun Image Represent 50:159–166CrossRef Hong SH, Wang L, Truong TK (2018) Low-complexity direct computation algorithm for cubic-spline interpolation scheme. J Vis Commun Image Represent 50:159–166CrossRef
go back to reference Huang S et al (2013) A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data. IEEE Trans Pattern Anal Mach Intell 35:1328–1342CrossRef Huang S et al (2013) A sparse structure learning algorithm for Gaussian Bayesian Network identification from high-dimensional data. IEEE Trans Pattern Anal Mach Intell 35:1328–1342CrossRef
go back to reference Li DC, Chen CC, Chang CJ, Lin WK (2012) A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Syst Appl 39:1575–1581CrossRef Li DC, Chen CC, Chang CJ, Lin WK (2012) A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Syst Appl 39:1575–1581CrossRef
go back to reference Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34:966–982CrossRef Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34:966–982CrossRef
go back to reference Li DC, Lin LS (2014) Generating information for small data sets with a multi-modal distribution. Decis Support Syst 66:71–81CrossRef Li DC, Lin LS (2014) Generating information for small data sets with a multi-modal distribution. Decis Support Syst 66:71–81CrossRef
go back to reference Li DC, Lin LS, Peng LJ (2014) Improving learning accuracy by using synthetic samples for small datasets with non-linear attribute dependency. Decis Support Syst 59:286–295CrossRef Li DC, Lin LS, Peng LJ (2014) Improving learning accuracy by using synthetic samples for small datasets with non-linear attribute dependency. Decis Support Syst 59:286–295CrossRef
go back to reference Li DC, Lin WK, Chen CC, Chen HY, Lin LS (2018) Rebuilding sample distributions for small dataset learning. Decis Support Syst 105:66–76CrossRef Li DC, Lin WK, Chen CC, Chen HY, Lin LS (2018) Rebuilding sample distributions for small dataset learning. Decis Support Syst 105:66–76CrossRef
go back to reference Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019) Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5:156–163CrossRef Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019) Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5:156–163CrossRef
go back to reference Martin-Diaz I, Morinigo-Sotelo D, Duque-Perez O, Romero-Troncoso RD (2017) Early fault detection in induction motors using adaboost with imbalanced small data and optimized sampling. IEEE Trans Ind Appl 53:3066–3075CrossRef Martin-Diaz I, Morinigo-Sotelo D, Duque-Perez O, Romero-Troncoso RD (2017) Early fault detection in induction motors using adaboost with imbalanced small data and optimized sampling. IEEE Trans Ind Appl 53:3066–3075CrossRef
go back to reference Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86:2196–2209CrossRef Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86:2196–2209CrossRef
go back to reference Ohashi T, Watanabe H, Tokuno J, Katagiri S, Ohsaki M, Matsuda S, Kashioka H (2012) Increasing virtual samples through loss smoothness determination in large geometric margin minimum classification error training. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Kyoto, Japan, pp 2081–2084. http://doi.org/https://doi.org/10.1109/ICASSP.2012.6288320 Ohashi T, Watanabe H, Tokuno J, Katagiri S, Ohsaki M, Matsuda S, Kashioka H (2012) Increasing virtual samples through loss smoothness determination in large geometric margin minimum classification error training. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Kyoto, Japan, pp 2081–2084. http://​doi.​org/​https://​doi.​org/​10.​1109/​ICASSP.​2012.​6288320
go back to reference Qin SJ, Chiang LH (2019) Advances and opportunities in machine learning for process data analytics. Comput Chem Eng 126:465–473CrossRef Qin SJ, Chiang LH (2019) Advances and opportunities in machine learning for process data analytics. Comput Chem Eng 126:465–473CrossRef
go back to reference Reuter C, Brambring F, Weirich J, Kleines A (2016) Improving data consistency in production control by adaptation of data mining algorithms. Procedia CIRP 56:545–550CrossRef Reuter C, Brambring F, Weirich J, Kleines A (2016) Improving data consistency in production control by adaptation of data mining algorithms. Procedia CIRP 56:545–550CrossRef
go back to reference Rodriguez-Amigo MC, Diez-Mediavilla M, Gonzalez-Pena D, Perez-Burgos A, Alonso-Tristan C (2017) Mathematical interpolation methods for spatial estimation of global horizontal irradiation in Castilla-Leon, Spain: A case study. Sol Energy 151:14–21CrossRef Rodriguez-Amigo MC, Diez-Mediavilla M, Gonzalez-Pena D, Perez-Burgos A, Alonso-Tristan C (2017) Mathematical interpolation methods for spatial estimation of global horizontal irradiation in Castilla-Leon, Spain: A case study. Sol Energy 151:14–21CrossRef
go back to reference Saez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203CrossRef Saez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203CrossRef
go back to reference Tan SF, Mavrovouniotis ML (1995) Reducing data dimensionality through optimizing neural-network inputs. AIChE J 41:1471–1480CrossRef Tan SF, Mavrovouniotis ML (1995) Reducing data dimensionality through optimizing neural-network inputs. AIChE J 41:1471–1480CrossRef
go back to reference Tang J, Jia M, Liu Z, Chai T, Yu W (2015) Modeling high dimensional frequency spectral data based on virtual sample generation technique. In: IEEE International Conference on Information and Automation, IEEE, Lijiang, China, pp 1090–1095. http://doi.org/https://doi.org/10.1109/ICInfA.2015.7279449 Tang J, Jia M, Liu Z, Chai T, Yu W (2015) Modeling high dimensional frequency spectral data based on virtual sample generation technique. In: IEEE International Conference on Information and Automation, IEEE, Lijiang, China, pp 1090–1095. http://​doi.​org/​https://​doi.​org/​10.​1109/​ICInfA.​2015.​7279449
go back to reference Tulsyan A, Garvin C, Undey C (2018) Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems. Biotechnol Bioeng 115:1915–1924CrossRef Tulsyan A, Garvin C, Undey C (2018) Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems. Biotechnol Bioeng 115:1915–1924CrossRef
go back to reference Van Gorp J, Rolain Y (2000) An interpolation technique for learning with sparse Data. IFAC Proc Vol 33:73–78CrossRef Van Gorp J, Rolain Y (2000) An interpolation technique for learning with sparse Data. IFAC Proc Vol 33:73–78CrossRef
go back to reference Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. NPJ Comput Mater 4:25CrossRef Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. NPJ Comput Mater 4:25CrossRef
go back to reference Zhu B, Chen ZS, He YL, Yu LA (2017a) A novel nonlinear functional expansion based PLS (FEPLS) and its soft sensor application. Chemom Intell Lab Syst 161:108–117CrossRef Zhu B, Chen ZS, He YL, Yu LA (2017a) A novel nonlinear functional expansion based PLS (FEPLS) and its soft sensor application. Chemom Intell Lab Syst 161:108–117CrossRef
go back to reference Zhu FY, Ma ZY, Li XX, Chen G, Chien JT, Xue JH, Guo J (2019) Image-text dual neural network with decision strategy for small-sample image classification. Neurocomputing 328:182–188CrossRef Zhu FY, Ma ZY, Li XX, Chen G, Chien JT, Xue JH, Guo J (2019) Image-text dual neural network with decision strategy for small-sample image classification. Neurocomputing 328:182–188CrossRef
go back to reference Zhu JL, Ge ZQ, Song ZH, Gao FR (2018) Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Annu Rev Control 46:107–133MathSciNetCrossRef Zhu JL, Ge ZQ, Song ZH, Gao FR (2018) Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Annu Rev Control 46:107–133MathSciNetCrossRef
go back to reference Zhu Q, Chen Z, Zhang X, Abbas R, Xu Y, Chen Y (2020) Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach. Soft Comput 24(9):6889–6902CrossRef Zhu Q, Chen Z, Zhang X, Abbas R, Xu Y, Chen Y (2020) Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach. Soft Comput 24(9):6889–6902CrossRef
go back to reference Zhu QX, Gong HF, Xu Y, He YL (2017) A bootstrap based virtual sample generation method for improving the accuracy of modeling complex chemical processes using small datasets. In: 6th Data Driven Control and Learning Systems, IEEE, Chongqing, China. http://doi.org/https://doi.org/10.1109/DDCLS.2017.8068049 Zhu QX, Gong HF, Xu Y, He YL (2017) A bootstrap based virtual sample generation method for improving the accuracy of modeling complex chemical processes using small datasets. In: 6th Data Driven Control and Learning Systems, IEEE, Chongqing, China. http://​doi.​org/​https://​doi.​org/​10.​1109/​DDCLS.​2017.​8068049
go back to reference Zhu QX, Li CF (2006) Dimensionality reduction with input training neural network and its application in chemical process modelling. Chin J Chem Eng 14:597–603CrossRef Zhu QX, Li CF (2006) Dimensionality reduction with input training neural network and its application in chemical process modelling. Chin J Chem Eng 14:597–603CrossRef
Metadata
Title
Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system
Authors
Zhong-Sheng Chen
Qun-Xiong Zhu
Yuan Xu
Yan-Lin He
Qing-Lin Su
Yiqing C. Liu
Zoltan K. Nagy
Publication date
27-02-2021
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 8/2021
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-021-05641-4

Other articles of this Issue 8/2021

Soft Computing 8/2021 Go to the issue

Premium Partner