Skip to main content
Top

2024 | OriginalPaper | Chapter

IDPP: Imbalanced Datasets Pipelines in Pyrus

Authors : Amandeep Singh, Olga Minguett

Published in: Engineering of Computer-Based Systems

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We showcase and demonstrate IDPP, a Pyrus-based tool that offers a collection of pipelines for the analysis of imbalanced datasets. Like Pyrus, IDPP is a web-based, low-code/no-code graphical modelling environment for ML and data analytics applications. On a case study from the medical domain, we solve the challenge of re-using AI/ML models that do not address data with imbalanced class by implementing ML algorithms in Python that do the re-balancing. We then use these algorithms and the original ML models in the IDPP pipelines. With IDPP, our low-code development approach to balance datasets for AI/ML applications can be used by non-coders. It simplifies the data-preprocessing stage of any AI/ML project pipeline, which can potentially improve the performance of the models. The tool demo will showcase the low-code implementation and no-code reuse and repurposing of AI-based systems through end-to end Pyrus pipelines.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
4.
go back to reference Lamprecht, A.-L., Margaria, T., Steffen, B.: Seven variations of an alignment workflow - an illustration of agile process design and management in Bio-jETI. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds.) ISBRA 2008. LNCS, vol. 4983, pp. 445–456. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79450-9_42CrossRef Lamprecht, A.-L., Margaria, T., Steffen, B.: Seven variations of an alignment workflow - an illustration of agile process design and management in Bio-jETI. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds.) ISBRA 2008. LNCS, vol. 4983, pp. 445–456. Springer, Heidelberg (2008). https://​doi.​org/​10.​1007/​978-3-540-79450-9_​42CrossRef
5.
go back to reference Lamprecht, A.L., Margaria, T., Steffen, B., Sczyrba, A., Hartmeier, S., Giegerich, R.: Genefisher-p: variations of genefisher as processes in Bio-jETI. BMC Bioinformatics 9(4), 1–15 (2008) Lamprecht, A.L., Margaria, T., Steffen, B., Sczyrba, A., Hartmeier, S., Giegerich, R.: Genefisher-p: variations of genefisher as processes in Bio-jETI. BMC Bioinformatics 9(4), 1–15 (2008)
10.
go back to reference Margaria, T., Steffen, B.: Business process modeling in the jABC: the one-thing approach. In: Handbook of research on business process modeling, pp. 1–26. IGI Global (2009) Margaria, T., Steffen, B.: Business process modeling in the jABC: the one-thing approach. In: Handbook of research on business process modeling, pp. 1–26. IGI Global (2009)
11.
go back to reference Margaria, T., Steffen, B.: Continuous model-driven engineering. Computer 42(10), 106–109 (2009)CrossRef Margaria, T., Steffen, B.: Continuous model-driven engineering. Computer 42(10), 106–109 (2009)CrossRef
12.
go back to reference Minguett Pirela, O.M.: Evaluation of machine learning classification techniques for handling class imbalance in medical datasets. M.Sc. in Artificial Intelligence, University of Limerick (2022) Minguett Pirela, O.M.: Evaluation of machine learning classification techniques for handling class imbalance in medical datasets. M.Sc. in Artificial Intelligence, University of Limerick (2022)
13.
go back to reference Naujokat, S., Lybecait, M., Kopetzki, D., Steffen, B.: Cinco: a simplicity-driven approach to full generation of domain-specific graphical modeling tools. Int. J. Softw. Tools Technol. Transfer 20, 327–354 (2018)CrossRef Naujokat, S., Lybecait, M., Kopetzki, D., Steffen, B.: Cinco: a simplicity-driven approach to full generation of domain-specific graphical modeling tools. Int. J. Softw. Tools Technol. Transfer 20, 327–354 (2018)CrossRef
14.
go back to reference Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, E.A.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, E.A.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
15.
go back to reference Xie, Z.: Building risk prediction models for type 2 diabetes using machine learning techniques. Prev. Chronic Dis. 16, e130 (2019) Xie, Z.: Building risk prediction models for type 2 diabetes using machine learning techniques. Prev. Chronic Dis. 16, e130 (2019)
16.
go back to reference Xu, Z., Shen, D., Nie, T., Kou, Y.: A hybrid sampling algorithm combining m-smote and ENN based on random forest for medical imbalanced data. J. Biomed. Inf. 107, 103465 (2020) Xu, Z., Shen, D., Nie, T., Kou, Y.: A hybrid sampling algorithm combining m-smote and ENN based on random forest for medical imbalanced data. J. Biomed. Inf. 107, 103465 (2020)
Metadata
Title
IDPP: Imbalanced Datasets Pipelines in Pyrus
Authors
Amandeep Singh
Olga Minguett
Copyright Year
2024
DOI
https://doi.org/10.1007/978-3-031-49252-5_6

Premium Partner