Skip to main content
Erschienen in: The Journal of Supercomputing 12/2016

01.12.2016

Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework

verfasst von: Tommaso Colombo, Holger Fröning, Pedro Javier Garcìa, Wainer Vandelli

Erschienen in: The Journal of Supercomputing | Ausgabe 12/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The ATLAS detector at CERN records particle collision “events” delivered by the Large Hadron Collider. Its data-acquisition system identifies, selects, and stores interesting events in near real-time, with an aggregate throughput of several 10 GB/s. It is a distributed software system executed on a farm of roughly 2000 commodity worker nodes communicating via TCP/IP on an Ethernet network. Event data fragments are received from the many detector readout channels and are buffered, collected together, analyzed and either stored permanently or discarded. This system, and data-acquisition systems in general, are sensitive to the latency of the data transfer from the readout buffers to the worker nodes. Challenges affecting this transfer include the many-to-one communication pattern and the inherently bursty nature of the traffic. The main performance issues brought about by this workload are addressed in this paper, focusing in particular on the so-called TCP incast pathology. Since performing systematic studies of these issues is often impeded by operational constraints related to the mission-critical nature of these systems, we developed a simulation model of the ATLAS data-acquisition system. The resulting simulation tool is based on the well-established, widely-used OMNeT++ framework. This tool was successfully validated by comparing the obtained simulation results with existing measurements of the system’s behavior. Furthermore, the simulation tool enables the study of the theoretical behavior of the system in numerous what-if scenarios and with modifications that are not immediately applicable to the real system. In this paper, we take advantage of this to analyze the behavior of the system using different traffic shaping and scheduling policies, and with network hardware modifications. This analysis leads to conclusions that could be used to devise future system enhancements.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Measurements show that both time intervals are smaller than 0.5  ms.
 
2
In this particular sentence, the word “event” refers to simulation events. To avoid confusion, throughout the rest of the paper, “event” will only be used in its high-energy physics meaning, i.e., “collision event”.
 
Literatur
2.
Zurück zum Zitat ATLAS Collaboration (2003) ATLAS high-level trigger, data-acquisition and controls. Technical Design Report ATLAS-TDR-016 CERN-LHCC-2003-022, CERN, Geneva ATLAS Collaboration (2003) ATLAS high-level trigger, data-acquisition and controls. Technical Design Report ATLAS-TDR-016 CERN-LHCC-2003-022, CERN, Geneva
4.
Zurück zum Zitat Phanishayee A et al (2008) Measurement and analysis of TCP throughput collapse in cluster-based storage systems. In: Proc. of the 6th USENIX Conference on File and Storage Technologies, FAST’08, pp 12:1–12:14. USENIX Association, Berkeley Phanishayee A et al (2008) Measurement and analysis of TCP throughput collapse in cluster-based storage systems. In: Proc. of the 6th USENIX Conference on File and Storage Technologies, FAST’08, pp 12:1–12:14. USENIX Association, Berkeley
6.
11.
14.
Zurück zum Zitat Köpke A, Swigulski M, Wessel K, Willkomm D, Haneveld PTK, Parker TEV, Visser OW, Lichte HS, Valentin S (2008) Simulating wireless and mobile networks in OMNeT++ the MiXiM vision. In: Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, Simutools 2008, p 71. ICST, Brussels. doi:10.4108/ICST.SIMUTOOLS2008.3031 Köpke A, Swigulski M, Wessel K, Willkomm D, Haneveld PTK, Parker TEV, Visser OW, Lichte HS, Valentin S (2008) Simulating wireless and mobile networks in OMNeT++ the MiXiM vision. In: Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, Simutools 2008, p 71. ICST, Brussels. doi:10.​4108/​ICST.​SIMUTOOLS2008.​3031
15.
Zurück zum Zitat Núñez A, Fernández J, Garcia JD, Prada L, Carretero J (2008) SIMCAN: a simulator framework for computer architectures and storage networks. In: Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, Simutools 2008, p 73. ICST, Brussels. doi:10.4108/ICST.SIMUTOOLS2008.3025 Núñez A, Fernández J, Garcia JD, Prada L, Carretero J (2008) SIMCAN: a simulator framework for computer architectures and storage networks. In: Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, Simutools 2008, p 73. ICST, Brussels. doi:10.​4108/​ICST.​SIMUTOOLS2008.​3025
16.
Zurück zum Zitat Yebenes P, Escudero-Sahuquillo J, Garcia PJ, Quiles FJ (2013) Towards modeling interconnection networks of exascale systems with OMNet++. In: Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP ’13, pp 203–207. IEEE Computer Society, Washington. doi:10.1109/PDP.2013.36 Yebenes P, Escudero-Sahuquillo J, Garcia PJ, Quiles FJ (2013) Towards modeling interconnection networks of exascale systems with OMNet++. In: Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP ’13, pp 203–207. IEEE Computer Society, Washington. doi:10.​1109/​PDP.​2013.​36
17.
Zurück zum Zitat Reschka T, Dreibholz T, Becke M, Pulinthanath J, Rathgeb EP (2008) Enhancement of the TCP module in the OMNeT++/INET framework. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, Simutools 2010, p 24. ICST, Brussels (2008). doi:10.4108/ICST.SIMUTOOLS2010.8834 Reschka T, Dreibholz T, Becke M, Pulinthanath J, Rathgeb EP (2008) Enhancement of the TCP module in the OMNeT++/INET framework. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, Simutools 2010, p 24. ICST, Brussels (2008). doi:10.​4108/​ICST.​SIMUTOOLS2010.​8834
19.
Zurück zum Zitat Henderson T, Floyd S, Gurtov A, Nishida Y (2012) The NewReno modification to TCP’s fast recovery algorithm. RFC 6582. RFC Editor. doi:10.17487/RFC6582 Henderson T, Floyd S, Gurtov A, Nishida Y (2012) The NewReno modification to TCP’s fast recovery algorithm. RFC 6582. RFC Editor. doi:10.​17487/​RFC6582
22.
Zurück zum Zitat Bawej T et al (2014) Boosting Event Building Performance using Infiniband FDR for the CMS Upgrade. Proc Sci TIPP2014:190 Bawej T et al (2014) Boosting Event Building Performance using Infiniband FDR for the CMS Upgrade. Proc Sci TIPP2014:190
25.
Zurück zum Zitat Jereczek G, Lehmann-Miotto G, Malone D (2015) Analogues between tuning TCP for data acquisition and datacenter networks. In: IEEE Int. Conf. Comm Jereczek G, Lehmann-Miotto G, Malone D (2015) Analogues between tuning TCP for data acquisition and datacenter networks. In: IEEE Int. Conf. Comm
Metadaten
Titel
Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework
verfasst von
Tommaso Colombo
Holger Fröning
Pedro Javier Garcìa
Wainer Vandelli
Publikationsdatum
01.12.2016
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 12/2016
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-016-1764-1

Weitere Artikel der Ausgabe 12/2016

The Journal of Supercomputing 12/2016 Zur Ausgabe