Skip to main content
Top
Published in: KI - Künstliche Intelligenz 1/2018

20-12-2017 | Technical Contribution

Big Data Science

Authors: Katharina Morik, Christian Bockermann, Sebastian Buschjäger

Published in: KI - Künstliche Intelligenz | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In ever more disciplines, science is driven by data, which leads to data analytics becoming a primary skill for researchers. This includes the complete process from data acquisition at sensors, over pre-processing and feature extraction to the use and application of machine learning. Sensors here often produce a plethora of data that needs to be dealt with in near-realtime, which requires a combined effort of implementations at the hardware level to high-level design of data flows. In this paper we outline two use-cases of this wide span of data analysis for science in a real-world example in astroparticle physics. We outline a high-level design approach which is capable of defining the complete data flow from sensor hardware to final analysis.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

KI - Künstliche Intelligenz

The Scientific journal "KI – Künstliche Intelligenz" is the official journal of the division for artificial intelligence within the "Gesellschaft für Informatik e.V." (GI) – the German Informatics Society - with constributions from troughout the field of artificial intelligence.

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Footnotes
1
The term has been used in Silicon Graphics by John Massey since 1998, but started to be popular only in 2012 with a trending peak in 2016 according to Google trends.
 
2
Project C3 by Wolfgang Rhode, Katharina Morik, Tim Ruhe investigates astrophysical data from the IceCube project and Cherenkov telescopes. Project C5 by Bernhard Spaan and Jens Teubner discusses the data of the LHCb experiment at the Large Hadron Collider (LHC) facility in Geneva.
 
3
The TU Dortmund university offers studies in data science within the statistics faculty since 2002. Within the computer science faculty, students may specialize on data science.
 
Literature
2.
go back to reference Bockermann C et al (2016) FACT-Tools—Processing high-volume telescope data. ADASS Conference Series - Astronomical Data Analysis Software & Systems Bockermann C et al (2016) FACT-Tools—Processing high-volume telescope data. ADASS Conference Series - Astronomical Data Analysis Software & Systems
3.
go back to reference Anderhub H, Backes M, Biland A, Boller A, Braun I, Bretz T, Commichau S, Commichau V, Domke M, Dorner D et al (2011) Fact—the first cherenkov telescope using a g-apd camera for tev gamma-ray astronomy. Nucl Instrum Methods Phys Res A 639:58–61CrossRef Anderhub H, Backes M, Biland A, Boller A, Braun I, Bretz T, Commichau S, Commichau V, Domke M, Dorner D et al (2011) Fact—the first cherenkov telescope using a g-apd camera for tev gamma-ray astronomy. Nucl Instrum Methods Phys Res A 639:58–61CrossRef
4.
go back to reference Atkins R et al (2000) Milagrito, a tev air-shower array. Nucl Instrum Methods Phys Res 449:478–499CrossRef Atkins R et al (2000) Milagrito, a tev air-shower array. Nucl Instrum Methods Phys Res 449:478–499CrossRef
5.
go back to reference Bacon DF, Rabbah R, Shukla S (2013) Fpga programming for the masses. Commun ACM 56(4):56–63CrossRef Bacon DF, Rabbah R, Shukla S (2013) Fpga programming for the masses. Commun ACM 56(4):56–63CrossRef
6.
go back to reference Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 671–680 Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 671–680
7.
go back to reference Bockermann C (2015) Mining big data streams for multiple concepts. Ph.D. Thesis, TU Dortmund University Bockermann C (2015) Mining big data streams for multiple concepts. Ph.D. Thesis, TU Dortmund University
8.
go back to reference Bockermann C, Brügge K, Buss J, Egorov A, Morik K, Rhode W, Ruhe T (2015) Online analysis of high-volume data streams in astroparticle physics. In: Proceedings of the European conference on Machine Learning (ECML), Industrial Track. Springer, Berlin Bockermann C, Brügge K, Buss J, Egorov A, Morik K, Rhode W, Ruhe T (2015) Online analysis of high-volume data streams in astroparticle physics. In: Proceedings of the European conference on Machine Learning (ECML), Industrial Track. Springer, Berlin
9.
go back to reference Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131 Courbariaux M, Bengio Y, David JP (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems, pp 3123–3131
13.
go back to reference Egorov A (2016) Distributed stream processing with the intention of mining. Master’s Thesis, TU Dortmund Egorov A (2016) Distributed stream processing with the intention of mining. Master’s Thesis, TU Dortmund
14.
go back to reference Fernandez RC, Pietzuch PR, Kreps J, Narkhede N, Rao J, Koshy J, Lin D, Riccomini C, Wang G (2015) Liquid: unifying nearline and offline big data integration. In: CIDR 2015, seventh biennial conference on innovative data systems research, Asilomar, CA, USA, January 4–7, 2015, Online Proceedings Fernandez RC, Pietzuch PR, Kreps J, Narkhede N, Rao J, Koshy J, Lin D, Riccomini C, Wang G (2015) Liquid: unifying nearline and offline big data integration. In: CIDR 2015, seventh biennial conference on innovative data systems research, Asilomar, CA, USA, January 4–7, 2015, Online Proceedings
16.
go back to reference Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
17.
go back to reference Hauck S, DeHon A (2008) Reconfigurable computing: the theory and practice of FPGA-based computation. Morgan Kaufmann, BurlingtonMATH Hauck S, DeHon A (2008) Reconfigurable computing: the theory and practice of FPGA-based computation. Morgan Kaufmann, BurlingtonMATH
19.
go back to reference Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2016) On large-batch training for deep learning: generalization gap and sharp minima. arXiv:1609.04836 (preprint ) Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2016) On large-batch training for deep learning: generalization gap and sharp minima. arXiv:1609.​04836 (preprint )
20.
go back to reference Kieda DB, VERITAS Collab (2004) Status of the VERITAS ground based GeV/TeV gamma-ray observatory. In: High Energy Astrophysics Division, Bulletin of the American Astronomical Society, vol 36, p 910 Kieda DB, VERITAS Collab (2004) Status of the VERITAS ground based GeV/TeV gamma-ray observatory. In: High Energy Astrophysics Division, Bulletin of the American Astronomical Society, vol 36, p 910
21.
go back to reference Krause A, Gomes RG (2010) Budgeted nonparametric learning from data streams. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 391–398 Krause A, Gomes RG (2010) Budgeted nonparametric learning from data streams. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 391–398
22.
go back to reference Krause A, Guestrin CE (2012) Near-optimal nonmyopic value of information in graphical models. arXiv:1207.1394 (preprint) Krause A, Guestrin CE (2012) Near-optimal nonmyopic value of information in graphical models. arXiv:1207.​1394 (preprint)
23.
go back to reference Lacey G, Taylor GW, Areibi S (2016) Deep learning on fpgas: past, present, and future. arXiv:1602.04283 (preprint) Lacey G, Taylor GW, Areibi S (2016) Deep learning on fpgas: past, present, and future. arXiv:1602.​04283 (preprint)
24.
go back to reference Lee S, Brzyski D, Bogdan M (2016) Fast saddle-point algorithm for generalized Dantzig selector and FDR control with the ordered l1-norm. In: Gretton A, Robert CC (eds) Proceedings of the 19th international conference on artificial intelligence and statistics (AISTATS), pp 780–789. JMLR W&CP. http://jmlr.org/proceedings/papers/v51/lee16b.html Lee S, Brzyski D, Bogdan M (2016) Fast saddle-point algorithm for generalized Dantzig selector and FDR control with the ordered l1-norm. In: Gretton A, Robert CC (eds) Proceedings of the 19th international conference on artificial intelligence and statistics (AISTATS), pp 780–789. JMLR W&CP. http://​jmlr.​org/​proceedings/​papers/​v51/​lee16b.​html
25.
go back to reference Lee S, Rahnenführer J, Lang M, de Preter K, Mestdagh P, Koster J, Versteeg R, Stallings R, Varesio L, Asgharzadeh S, Schulte J, Fielitz K, Heilmann M, Morik K, Schramm A (2014) Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. PLoS One 9:e108818CrossRef Lee S, Rahnenführer J, Lang M, de Preter K, Mestdagh P, Koster J, Versteeg R, Stallings R, Varesio L, Asgharzadeh S, Schulte J, Fielitz K, Heilmann M, Morik K, Schramm A (2014) Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling. PLoS One 9:e108818CrossRef
26.
go back to reference Marz N, Warren J (2014) Big data–principles and best practices of scalable realtime data systems. Manning Publications Co., Greenwich Marz N, Warren J (2014) Big data–principles and best practices of scalable realtime data systems. Manning Publications Co., Greenwich
27.
go back to reference Minoux M (1978) Accelerated greedy algorithms for maximizing submodular set functions. In: Optimization techniques. Springer, pp 234–243 Minoux M (1978) Accelerated greedy algorithms for maximizing submodular set functions. In: Optimization techniques. Springer, pp 234–243
28.
go back to reference Molina A, Natarajan S, Kersting K (2017) Poisson sum-product networks: a deep architecture for tractable multivariate poisson distributions. In: Singh S, Markovitch S (eds) Proceedings of the 31st AAAI conference on artificial intelligence (AAAI). AAAI Press Molina A, Natarajan S, Kersting K (2017) Poisson sum-product networks: a deep architecture for tractable multivariate poisson distributions. In: Singh S, Markovitch S (eds) Proceedings of the 31st AAAI conference on artificial intelligence (AAAI). AAAI Press
29.
go back to reference Muller LK, Indiveri G (2015) Rounding methods for neural networks with low resolution synaptic weights. arXiv:1504.05767 (preprint) Muller LK, Indiveri G (2015) Rounding methods for neural networks with low resolution synaptic weights. arXiv:1504.​05767 (preprint)
31.
go back to reference Ngiam J, Coates A, Lahiri A, Prochnow B, Le QV, Ng AY (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 265–272 Ngiam J, Coates A, Lahiri A, Prochnow B, Le QV, Ng AY (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 265–272
35.
go back to reference Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542 Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542
36.
go back to reference Richter J, Kotthaus H, Bischl B, Marwedel P, Rahnenführer J, Lang M (2016) Faster model-based optimization through resource-aware scheduling strategies. In: Proceedings of the 10th international conference: learning and intelligent optimization (LION 10), Lecture notes in computer science (LNCS), vol 10079. Springer International Publishing, pp 267–273 Richter J, Kotthaus H, Bischl B, Marwedel P, Rahnenführer J, Lang M (2016) Faster model-based optimization through resource-aware scheduling strategies. In: Proceedings of the 10th international conference: learning and intelligent optimization (LION 10), Lecture notes in computer science (LNCS), vol 10079. Springer International Publishing, pp 267–273
38.
go back to reference William PH, Saul A, Vetterling WT, Flannery BP (2007) Numerical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge University Press, New York, USA William PH, Saul A, Vetterling WT, Flannery BP (2007) Numerical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge University Press, New York, USA
39.
go back to reference Wulf N (2013) Speicherung und Analyse von BigData am Beispiel der Daten des FACT-Teleskops. Master’s Thesis, AI Group, Computer Science Department, TU Dortmund Wulf N (2013) Speicherung und Analyse von BigData am Beispiel der Daten des FACT-Teleskops. Master’s Thesis, AI Group, Computer Science Department, TU Dortmund
Metadata
Title
Big Data Science
Authors
Katharina Morik
Christian Bockermann
Sebastian Buschjäger
Publication date
20-12-2017
Publisher
Springer Berlin Heidelberg
Published in
KI - Künstliche Intelligenz / Issue 1/2018
Print ISSN: 0933-1875
Electronic ISSN: 1610-1987
DOI
https://doi.org/10.1007/s13218-017-0522-8

Other articles of this Issue 1/2018

KI - Künstliche Intelligenz 1/2018 Go to the issue

Doctoral and Postdoctoral Dissertations

Randomized Primitives for Big Data Processing

Premium Partner