Skip to main content

2020 | OriginalPaper | Buchkapitel

Data Provenance Based System for Classification and Linear Regression in Distributed Machine Learning

verfasst von : Muhammad Jahanzeb Khan, Ruoyu Wang, Daniel Sun, Guoqiang Li

Erschienen in: Structured Object-Oriented Formal Language and Method

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, data provenance is widely used to increase the accuracy of machine learning models. However, facing the difficulties in information heredity, these models produce data association. Most of the studies in the field of data provenance are focused on specific domains. And there are only a few studies on a machine learning (ML) framework with distinct emphasis on the accurate partition of coherent and physical activities with implementation of ML pipelines for provenance. This paper presents a novel approach to usage of data provenance which is also called data provenance based system for classification and linear regression in distributed machine learning (DPMLR). To develop the comprehensive approach for data analysis and visualization based on a collective set of functions for various algorithms and provide the ability to run large scale graph analysis, we apply StellarGraph as our primary ML structure. The preliminary results on the complex data stream structure showed that the overall overhead is no more than 20%. It opens up opportunities for designing an integrated system which performs dynamic scheduling and network bounded synchronization based on the ML algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
5.
Zurück zum Zitat Wang, X., Zeng, K., Govindan, K., Mohapatra, P.: Chaining for securing data provenance in distributed information networks. In: MILCOM 2012 - 2012 IEEE Military Communications Conference, Orlando, FL, pp. 1–6 (2012) Wang, X., Zeng, K., Govindan, K., Mohapatra, P.: Chaining for securing data provenance in distributed information networks. In: MILCOM 2012 - 2012 IEEE Military Communications Conference, Orlando, FL, pp. 1–6 (2012)
6.
Zurück zum Zitat Wang, R., Sun, D., Li, G., Atif, M., Nepal, S.: LogProv: logging events as provenance of big data analytics pipelines with trustworthiness. In: 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, pp. 1402–1411 (2016) Wang, R., Sun, D., Li, G., Atif, M., Nepal, S.: LogProv: logging events as provenance of big data analytics pipelines with trustworthiness. In: 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, pp. 1402–1411 (2016)
7.
Zurück zum Zitat Bechhofer, S., Goble, C., Buchan, I.: Research objects: towards exchange and reuse of digital knowledge (2010).(August 2017) Bechhofer, S., Goble, C., Buchan, I.: Research objects: towards exchange and reuse of digital knowledge (2010).(August 2017)
8.
Zurück zum Zitat Xu, S., Rogers, T., Fairweather, E., Glenn, A., Curran, J., Curcin, V.: Application of data provenance in healthcare analytics software: information visualisation of user activities. AMIA Joint Summits Transl. Sci. Proc. 2017, 263–272 (2018) Xu, S., Rogers, T., Fairweather, E., Glenn, A., Curran, J., Curcin, V.: Application of data provenance in healthcare analytics software: information visualisation of user activities. AMIA Joint Summits Transl. Sci. Proc. 2017, 263–272 (2018)
16.
Zurück zum Zitat Bertino, E., Lim, H.-S.: Assuring data trustworthiness: concepts and research challenges. In: Proceedings of the 7th VLDB Conference on Secure Data Management service, SDM 2010, pp. 1–12 (2010) Bertino, E., Lim, H.-S.: Assuring data trustworthiness: concepts and research challenges. In: Proceedings of the 7th VLDB Conference on Secure Data Management service, SDM 2010, pp. 1–12 (2010)
17.
Zurück zum Zitat Schelter, S., Boese, J.H., Kirschnick, J., Klein, T., Seufert, S.: Automatically tracking metadata and provenance of machine learning experiments. In: Machine Learning Systems workshop at NIPS (2017) Schelter, S., Boese, J.H., Kirschnick, J., Klein, T., Seufert, S.: Automatically tracking metadata and provenance of machine learning experiments. In: Machine Learning Systems workshop at NIPS (2017)
22.
25.
Zurück zum Zitat Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015), pp. 1335–1344. ACM, New York (2015). https://doi.org/10.1145/2783258.2783323 Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015), pp. 1335–1344. ACM, New York (2015). https://​doi.​org/​10.​1145/​2783258.​2783323
32.
Zurück zum Zitat Bykov, S., Geller, A., Kliot, G., Larus, J.R., Pandya, R., Andthelin, J.: Orleans: cloud computing for everyone. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, p. 16. ACM (2011) Bykov, S., Geller, A., Kliot, G., Larus, J.R., Pandya, R., Andthelin, J.: Orleans: cloud computing for everyone. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, p. 16. ACM (2011)
Metadaten
Titel
Data Provenance Based System for Classification and Linear Regression in Distributed Machine Learning
verfasst von
Muhammad Jahanzeb Khan
Ruoyu Wang
Daniel Sun
Guoqiang Li
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-41418-4_19