Top

Published in:

2019 | OriginalPaper | Chapter

PerfMemPlus: A Tool for Automatic Discovery of Memory Performance Problems

Authors : Christian Helm, Kenjiro Taura

Published in: High Performance Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In high-performance computing many performance problems are caused by the memory system. Because such performance bugs are hard to identify, analysis tools play an important role in performance optimization. Today’s processors offer feature-rich performance monitoring units with support for instruction sampling. But existing tools only partially use this data. Previously, performance counters were used to measure the memory bandwidth. But the attribution of high bandwidth to source code has been difficult and imprecise. We introduce a novel method for identifying performance degrading bandwidth usage and attributing it to specific objects and source code lines. This paper also introduces a new method for false sharing detection. It can differentiate false and true sharing, identify objects and source code lines where the accesses to falsely shared objects are happening. It can uncover false sharing, which has been overlooked by previous tools. PerfMemPlus automatically reports those issues by using instruction sampling data captured with a single profiling run. This simplifies the tedious search for the location of performance problems in complex code. The tool design is simple, provides support for many existing and upcoming processors and the recorded data can be easily used in future research. We show that PerfMemPlus can automatically report performance problems without producing false positives. Additionally, we present case studies that show how PerfMemPlus can pinpoint memory performance problems in the PARSEC benchmarks and machine learning applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter MaLTESE: Large-Scale Simulation-Driven Machine Learning for Transient Driving Cycles

next chapter GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications

Bhadauria, M., Weaver, V.M., Mckee, S.A.: Understanding parsec performance on contemporary CMPS. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 98–107 (2009)

Bienia, C.: Benchmarking Modern Multiprocessors. Ph.D. thesis, Princeton University (2011)

Bingmann, T.: Parallel Memory Bandwidth Benchmark (2013). https://panthema.net/2013/pmbw/

Chabbi, M., Wen, S., Liu, X.: Featherlight on-the-fly false-sharing detection. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 152–167 (2018)

Drebes, A., Pop, A., Heydemann, K., Cohen, A., Drachtemam, N.: Aftermath: a graphical tool for performance analysis and debugging of fine-grained task-parallel programs and run-time systems. In: 7th Workshop on Programmability Issues for Heterogeneous Multicores (2014)

Eklov, D., Nikoleris, N., Hagersten, E.: A software based profiling method for obtaining speedup stacks on commodity multi-cores. In: ISPASS 2014 - IEEE International Symposium on Performance Analysis of Systems and Software (2014)

Eyerman, S., Du Bois, K., Eeckhout, L.: Speedup stacks: identifying scaling bottlenecks in multi-threaded applications. In: ISPASS 2012 - IEEE International Symposium on Performance Analysis of Systems and Software, pp. 145–155 (2012)

Gimenez, A., et al.: MemAxes: visualization and analytics for characterizing complex memory performance behaviors. IEEE Trans. Vis. Comput. Graph. 27(5), 2180–2193 (2017)CrossRef

Giménez, A., et al.: Dissecting on-node memory access performance: a semantic approach. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC, pp. 166–176 (2014)

10.

GNU: gprof (2018). https://sourceware.org/binutils/docs/gprof/

11.

Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). http://eigen.tuxfamily.org

12.

Huynh, A., Taura, K.: Delay Spotter: a tool for spotting scheduler-caused delays in task parallel runtime systems. In: IEEE International Conference on Cluster Computing, ICCC, pp. 114–125 (2017)

13.

Intel Corporation: Avoiding and identifying false sharing among threads (2012). https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads

14.

Intel Corporation: Finding your memory access performance bottlenecks (2016). https://software.intel.com/en-us/articles/finding-your-memory-access-performance-bottlenecks

15.

Jayasena, S., et al.: Detection of false sharing using machine learning. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis - SC 2013 (2013)

16.

Lachaize, R., Lepers, B., Quéma, V.: MemProf: a memory profiler for NUMA multicore systems. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, p. 5 (2012)

17.

LeCun, Y., Cortes, C., Burges, C.: The Mnist Database of Handwritten Digits (2016). http://yann.lecun.com/exdb/mnist/

18.

Liu, T., Berger, E.D.: SHERIFF: precise detection and automatic mitigation of false sharing. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, pp. 3–18 (2011)

19.

Liu, T., Liu, X.: Cheetah: detecting false sharing efficiently and effectively. In: Proceedings of the International Symposium on Code Generation and Optimization (2016)

20.

Liu, T., Tian, C., Hu, Z., Berger, E.D.: PREDATOR: predictive false sharing detection. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2014)

21.

Liu, X., Mellor-Crummey, J.: A data-centric profiler for parallel programs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis - SC 2013 (2013)

22.

Liu, X., Mellor-Crummey, J.: A tool to analyze the performance of multithreaded programs on NUMA architectures. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 259–272 (2014)

23.

Liu, X., Sharma, K., Mellor-Crummey, J.: ArrayTool: a lightweight profiler to guide array regrouping. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp. 405–416 (2014)

24.

Liu, X., Wu, B.: ScaAnalyzer: a tool to identify memory scalability bottlenecks in parallel programs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis - SC 2015 (2015)

25.

Majo, Z., Gross, T.R.: (Mis) Understanding the NUMA memory system performance of multithreaded workloads. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 11–22 (2013)

26.

Mario, J.: C2C - False Sharing Detection in Linux Perf (2016). https://joemario.github.io/blog/2016/09/01/c2c-blog

27.

McCalpin, J.D.: STREAM benchmark (1995). http://www.cs.virginia.edu/stream/

28.

Paratools: Threadspotter (2018). http://threadspotter.paratools.com

29.

Pesterev, A., Zeldovich, N., Morris, R.T., Orlando, T.P.: Locating cache performance bottlenecks using data profiling. In: Proceedings of the 5th European Conference on Computer Systems EuroSys 2010, p. 335 (2010)

30.

Qiao, Y., et al.: Parallelizing and optimizing neural Encoder Decoder models without padding on multi-core architecture. Future Gener. Comput. Syst. (2018)

31.

Roth, M., Best, M.J., Mustard, C., Fedorova, A.: Deconstructing the overhead in parallel applications. In: Proceedings - 2012 IEEE International Symposium on Workload Characterization, IISWC 2012 1, pp. 59–68 (2012)

32.

Southern, G., Renau, J.: Deconstructing PARSEC scalability. In: 11th Annual Workshop on Duplicating, Deconstructing and Debunking, p. 10 (2015)

33.

Taura, K.: Mnist application (2016). https://www.eidos.ic.i.u-tokyo.ac.jp/~tau/lecture/paralleldistributed/2016/examples/18mnist/

34.

Viswanathan, V., Kumar, K., Willhalm, T., Lu, P., Filipiak, B., Sakthivelu, S.: Intel memory latency checker (2018). https://software.intel.com/en-us/articles/intelr-memory-latency-checker

35.

Xu, H., Wen, S., Gimenez, A., Gamblin, T., Liu, X.: DR-BW: identifying bandwidth contention in NUMA architectures with supervised learning. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS (2017)

Title: PerfMemPlus: A Tool for Automatic Discovery of Memory Performance Problems
Authors: Christian Helm
Kenjiro Taura
Publisher: Springer International Publishing
Book: High Performance Computing
Print ISBN: 978-3-030-20655-0

Electronic ISBN: 978-3-030-20656-7

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-20656-7_11

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner