Abstract
As applications and operating systems are becoming more complex, the last decade has seen the rise of many tracing tools all across the software stack. This article presents a hands-on comparison of modern tracers on Linux systems, both in user space and kernel space. The authors implement microbenchmarks that not only quantify the overhead of different tracers, but also sample fine-grained metrics that unveil insights into the tracers’ internals and show the cause of each tracer’s overhead. Internal design choices and implementation particularities are discussed, which helps us to understand the challenges of developing tracers. Furthermore, this analysis aims to help users choose and configure their tracers based on their specific requirements to reduce their overhead and get the most of out of them.
- Georgios Bitzes and Andrzej Nowak. 2014. The overhead of profiling using PMU hardware counters. CERN Openlab Report (2014). Retrieved from https://zenodo.org/record/10800/files/TheOverheadOfProfilingUsingPMUhardwareCounters.pdf.Google Scholar
- Jan Blunck, Mathieu Desnoyers, and Pierre-Marc Fournier. 2009. Userspace application tracing with markers and tracepoints. In Proceedings of the Linux Kongress. Dresden, Germany, 7--14.Google Scholar
- Yannick Brosseau. 2017. A userspace tracing comparison: Dtrace vs LTTng UST. Retrieved from http://www.dorsal.polymtl.ca/fr/blog/yannick-brosseau/userspace-tracing-comparison-dtrace-vs-lttng-ust.Google Scholar
- Mathieu Desnoyers. 2009. Low-Impact Operating System Tracing. Ph.D. Dissertation. École Polytechnique de Montréal.Google Scholar
- Mathieu Desnoyers. 2012. Common trace format (CTF) specification (v1. 8.2). Common Trace Format GIT Repository (2012). Retrieved from https://github.com/efficios/ctf/blob/master/common-trace-format-specification.md.Google Scholar
- Mathieu Desnoyers. 2016a. Restartable sequences system call. Retrieved from http://www.mail-archive.com/[email protected]/msg1213826.html.Google Scholar
- Mathieu Desnoyers. 2016b. Semantics and Behavior of Local Atomic Operations. Documentation/local_ops.txt. (2016). Linux kernel version 4.5.0. Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/local_ops.txt?h=v4.5.Google Scholar
- Mathieu Desnoyers. 2016c. Tracepoints documentation in the Linux kernel. Documentation/trace/tracepoints.txt. Linux kernel version 4.5.0. Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/trace/tracepoints.txt?h=v4.5.Google Scholar
- Mathieu Desnoyers and Michel Dagenais. 2006a. Low disturbance embedded system tracing with linux trace toolkit next generation. In Proceedings of the Embedded Linux Conference (ELC’06), Vol. 2006. Citeseer, San Jose, California.Google Scholar
- Mathieu Desnoyers and Michel Dagenais. 2008. LTTng: Tracing across execution layers, from the hypervisor to user-space. In Proceedings of the Linux Symposium, Vol. 101. Ottawa Linux Symposium, 101--106.Google Scholar
- Mathieu Desnoyers and Michel R. Dagenais. 2006b. The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux. In Proceedings of the Ottawa Linux Symposium (OLS’06), Vol. 2006. Citeseer, 209--224.Google Scholar
- Mathieu Desnoyers and Michel R. Dagenais. 2009. Lttng, filling the gap between kernel instrumentation and a widely usable kernel tracer. In Linux Foundation Collaboration Summit 2009. Linux Foundation.Google Scholar
- Mathieu Desnoyers and Michel R. Dagenais. 2010. Synchronization for fast and reentrant operating system kernel tracing. Softw. Pract. Exp. 40, 12 (2010), 1053--1072. Google ScholarDigital Library
- Mathieu Desnoyers and Michel R. Dagenais. 2012. Lockless multi-core high-throughput buffering scheme for kernel tracing. ACM SIGOPS Op. Syst. Rev. 46, 3 (2012), 65--81. Google ScholarDigital Library
- Mathieu Desnoyers, Paul E. McKenney, Alan S. Stern, Michel R. Dagenais, and Jonathan Walpole. 2012. User-level implementations of read-copy update. IEEE Trans. Parallel Distrib. Syst. 23, 2 (2012), 375--382. Google ScholarDigital Library
- Frank Ch. Eigler. 2006. Problem solving with systemtap. In Proceedings of the Ottawa Linux Symposium. Citeseer, 261--268.Google Scholar
- Extrae. 2016. Extrae website. Retrieved from http://www.vi-hps.org/tools/extrae.html.Google Scholar
- Pierre-Marc Fournier, Mathieu Desnoyers, and Michel R. Dagenais. 2009. Combined tracing of the kernel and applications with LTTng. In Proceedings of the 2009 Linux Symposium. Citeseer, 87--93.Google Scholar
- M. Frysinger. 2016. Function tracer guts. Documentation/trace/ftrace-design.txt. Linux kernel version 4.5.0. Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/trace/ftrace-design.txt?h=v4.5.Google Scholar
- Amir Reza Ghods. 2016. A study of Linux Perf and slab allocation sub-systems. Master thesis, University of Waterloo. Retrieved from http://hdl.handle.net/10012/10184.Google Scholar
- Github. 2017a. BCC project. Retrieved from https://github.com/iovisor/bcc.Google Scholar
- Github. 2017b. Chisels User Guide. Retrieved from https://github.com/draios/sysdig/wiki/Chisels-User-Guide.Google Scholar
- Github. 2017c. KTap: A lightweight script-based dynamic tracing tool for Linux. Retrieved from https://github.com/ktap/ktap.Google Scholar
- Brendan Gregg. 2017. Brendan Gregg Linux Performance. Retrieved from http://www.brendangregg.com.Google Scholar
- Brendan Gregg and Jim Mauro. 2011. DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD. Prentice Hall Professional. Google ScholarDigital Library
- M. Haardt and M. Coleman. 1999. ptrace(2) Linux Programmer’s Manual. Retrieved from http://man7.org/linux/man-pages/man2/ptrace.2.html.Google Scholar
- John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier. Google ScholarDigital Library
- M. Hiramatsu. 2010. Kprobes jump optimization support. (Feb. 2010). https://lwn.net/Articles/375232.Google Scholar
- M. Hiramatsu, J. Keniston, and P. S. Panchamukhi. 2016. Kernel Probes (Kprobes). Documentation/kprobes.txt. (2016). Linux kernel version 4.5.0.Google Scholar
- Intel Corporation. 2016. Intel® 64 and IA-32 Architectures Software Developer’s Manual, No. 325462-045US.Google Scholar
- Michael K. Johnson and Erik W. Troan. 2004. Linux Application Development. Addison-Wesley Professional. Google ScholarDigital Library
- Tomas Kalibera and Richard Jones. 2013. Rigorous benchmarking in reasonable time. In ACM SIGPLAN Not., Vol. 48. ACM, 63--74. Google ScholarDigital Library
- Michael Kerrisk. 2010. The Linux Programming Interface. No Starch Press. Google ScholarDigital Library
- Andreas Knüpfer, Holger Brunst, Jens Doleschal, Matthias Jurenz, Matthias Lieber, Holger Mickler, Matthias S. Müller, and Wolfgang E. Nagel. 2008. The vampir performance analysis tool-set. In Tools for High Performance Computing, Michael Resch, Rainer Keller, Valentin Himmler, Bettina Krammer, and Alexander Schulz (Eds.) Springer, 139--155.Google Scholar
- Robert Love. 2005. Linux Kernel Development. Novell Press. Google ScholarDigital Library
- Ananth Mavinakayanahalli, Prasanna Panchamukhi, Jim Keniston, Anil Keshavamurthy, and Masami Hiramatsu. 2006. Probing the guts of kprobes. In Proceedings of the Linux Symposium, Vol. 6. Ottawa Linux Symposium, 101--116.Google Scholar
- Steven McCanne and Van Jacobson. 1993. The BSD packet filter: A new architecture for user-level packet capture. In USENIX Winter, Vol. 46, 259--270. Google ScholarDigital Library
- Paul E. McKenney and John D. Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Proceedings of the 10th IASTED International Conference on Parallel and Distributed Computing and Systems, Oct. 1998. 509--518.Google Scholar
- Bojan Mihajlović, Željko Žilić, and Warren J. Gross. 2014. Dynamically instrumenting the QEMU emulator for Linux process trace generation with the GDB debugger. ACM Trans. Embed. Comput. Syst. (TECS’14) 13, 5s (2014), 167. Google ScholarDigital Library
- Shirley Moore, David Cronk, Kevin London, and Jack Dongarra. 2001. Review of performance analysis tools for MPI parallel programs. In Proceedings of the European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting. Springer, 241--248. Google ScholarDigital Library
- Matthias S. Müller, Andreas Knüpfer, Matthias Jurenz, Matthias Lieber, Holger Brunst, Hartmut Mix, and Wolfgang E. Nagel. 2007. Developing scalable applications with Vampir, VampirServer and VampirTrace. In PARCO, Vol. 15. Citeseer, 637--644.Google Scholar
- Pradeep Padala. 2002. Playing with ptrace, Part I. Linux J. 2002, 103 (2002), 5. Google ScholarDigital Library
- J. S. Peek. 1996. System and method for creating thread-safe shared libraries. U.S. Patent No. 5,481,706. Jan. 2, 1996. Retrieved from https://www.google.com/patents/US5481706.Google Scholar
- Vincent Pillet, Jesús Labarta, Toni Cortes, and Sergi Girona. 1995. Paraver: A tool to visualize and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments, Vol. 44. mar, 17--31.Google Scholar
- Vara Prasad, William Cohen, F. C. Eigler, Martin Hunt, Jim Keniston, and J. Chen. 2005. Locating system problems using dynamic instrumentation. In 2005 Ottawa Linux Symposium. Citeseer, 49--64.Google Scholar
- Steven Rostedt. 2009a. Debugging the kernel using Ftrace - Part 1. (2009). https://lwn.net/Articles/365835.Google Scholar
- Steven Rostedt. 2009b. Finding origins of latencies using ftrace. In Proceedings of the Eleventh Real-Time Linux Workshop, Dresden, Germany, September 2009.Google Scholar
- S. Rostedt. 2010. Using the trace event macro. Retrieved from http://lwn.net/Articles/379903.Google Scholar
- S. Rostedt. 2016a. ftrace - Function Tracer. Documentation/trace/ftrace.txt. Linux kernel version 4.5.0. Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/trace/ftrace.txt?h=v4.5.Google Scholar
- S. Rostedt. 2016b. Lockless Ring Buffer Design. Documentation/trace/ring-buffer-design.txt. Linux kernel version 4.5.0. Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/trace/ring-buffer-design.txt?h=v4.5.Google Scholar
- Robert Schöne, Ronny Tschüter, Thomas Ilsche, and Daniel Hackenberg. 2010. The VampirTrace plugin counter interface: Introduction and examples. In Proceedings of the European Conference on Parallel Processing. Springer, 501--511. Google ScholarDigital Library
- A. Starovoitov, J. Schulist, D. Borkmann. 2016. Linux Socket Filtering aka Berkeley Packet Filter (BPF). Documentation/networking/filter.txt. Linux kernel version 4.5.0. Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.txt?h=v4.5.Google Scholar
- Jan-Willem Selij and Eric van den Haak. 2014. A visitation of sysdig. Project Report. Retrieved from https://www.os3.nl/_media/2013-2014/courses/ccf/sysdig-jan-willem-eric.pdf.Google Scholar
- Suchakrapani Sharma and Michel Dagenais. 2016a. Hardware-assisted instruction profiling and latency detection. J.f Eng. 1, 1 (2016).Google Scholar
- Suchakrapani Datt Sharma and Michel Dagenais. 2016b. Enhanced userspace and in-kernel trace filtering for production systems. J. Comput. Sci. Technol. 31, 6 (2016), 1161--1178.Google ScholarCross Ref
- Narendran Sivakumar and Sriram Sundar Rajan. 2010. Effectiveness of tracing in a multicore environment. (2010).Google Scholar
- James E. Smith. 1981. A study of branch prediction strategies. In Proceedings of the 8th Annual Symposium on Computer Architecture. IEEE Computer Society Press, 135--148. Google ScholarDigital Library
- Andrew S. Tanenbaum and Herbert Bos. 2014. Modern Operating Systems. Prentice Hall Press. Google ScholarDigital Library
- Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting performance data with PAPI-C. In Tools for High Performance Computing 2009. Springer, 157--173.Google Scholar
- Reinhard Wilhelm, Daniel Grund, Jan Reineke, Marc Schlickling, Markus Pister, and Christian Ferdinand. 2009. Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 28, 7 (2009), 966. Google ScholarDigital Library
Index Terms
- Survey and Analysis of Kernel and Userspace Tracers on Linux: Design, Implementation, and Overhead
Recommendations
Lockless multi-core high-throughput buffering scheme for kernel tracing
Studying execution of concurrent real-time online systems, to identify far-reaching and hard to reproduce latency and performance problems, requires a mechanism able to cope with voluminous information extracted from execution traces. Furthermore, the ...
Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l
ICS '08: Proceedings of the 22nd annual international conference on SupercomputingThe Blue Gene machines in production today run a small single-user, single-process kernel (CNK) having a limited functionality. Motivated by the desire to provide applications with a much richer operating environment, we evaluate the effect of replacing ...
Adapting Linux for mobile platforms: An empirical study of Android
ICSM '12: Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)To deliver a high quality software system in a short release cycle time, many software organizations chose to reuse existing mature software systems. Google has adapted one of the most reused computer operating systems (i.e., Linux) into an operating ...
Comments