skip to main content
10.1145/2150976.2151001acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Bottleneck identification and scheduling in multithreaded applications

Published:03 March 2012Publication History

ABSTRACT

Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. These bottlenecks serialize execution, waste valuable execution cycles, and limit scalability of applications. This paper proposes Bottleneck Identification and Scheduling in Multithreaded Applications (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks. BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores on an Asymmetric Chip Multi-Processor (ACMP). Unlike previous work that targets specific bottlenecks, BIS can identify and accelerate bottlenecks regardless of their type. We compare BIS to four previous approaches and show that it outperforms the best of them by 15% on average. BIS' performance improvement increases as the number of cores and the number of fast cores in the system increase.

References

  1. MySQL database engine 5.0.1. http://www.mysql.com.Google ScholarGoogle Scholar
  2. SQLite database engine version 3.5.8. 2008.Google ScholarGoogle Scholar
  3. SysBench: a system performance benchmark v0.4.8. http://sysbench.sourceforge.net.Google ScholarGoogle Scholar
  4. G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS, 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's law through EPI throttling. In ISCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. H. Bailey et al. The NAS parallel benchmarks. Technical Report RNR-94-007, NASA Ames Research Center, 1994.Google ScholarGoogle Scholar
  7. A. Bhattacharjee and M. Martonosi. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Q. Cai, J. González, R. Rakvic, G. Magklis, P. Chaparro, and A. González. Meeting points: using thread criticality to adapt multicore hardware to parallel regions. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. Parallel application memory scheduling. In MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Fields, R. Bodik, and M. Hill. Slack: maximizing performance under technological constraints. In ISCA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Fields, S. Rubin, and R. Bodik. Focusing processor policies via critical-path prediction. In ISCA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. D. Hill and M. R. Marty. Amdahl's law in Multicore Era. Technical Report CS-TR-2007--1593, Univ. of Wisconsin, 2007.Google ScholarGoogle Scholar
  14. J. K. Hollingsworth. An online computation of critical path profiling. In SIGMETRICS Symposium on Parallel and Distributed Tools, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. H. Hormati, Y. Choi, M. Kudlur, R. Rabbah, T. Mudge, and S. Mahlke. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. In PACT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Intel. Source code for Intel threading building blocks.Google ScholarGoogle Scholar
  17. Intel. IA-32 Intel Architecture Software Dev. Guide, 2008.Google ScholarGoogle Scholar
  18. Intel. Getting Started with Intel Parallel Studio, 2011.Google ScholarGoogle Scholar
  19. D. Koufaty, D. Reddy, and S. Hahn. Bias scheduling in heterogeneous multi-core architectures. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Kredel. Source code for traveling salesman problem (tsp). http://krum.rz.uni-mannheim.de/ba-pp-2007/java/index.html.Google ScholarGoogle Scholar
  21. R. Kumar, D. Tullsen, N. Jouppi, and P. Ranganathan. Heterogeneous chip multiprocessors. IEEE Computer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. B. Lakshminarayana, J. Lee, and H. Kim. Age based scheduling for asymmetric multiprocessors. In SC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Li, A. R. Lebeck, and D. J. Sorin. Quantifying instruction criticality for shared memory multiprocessors. In SPAA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. F. Martínez and J. Torrellas. Speculative synchronization: applying thread-level speculation to explicitly parallel applications. In ASPLOS, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. M. Mellor-Crummey and M. L. Scott. Synchronization without contention. In ASPLOS, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Morad, U. C. Weiser, A. Kolodny, M. Valero, and E. Ayguade. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. Computer Architecture Letters, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. MineBench: A benchmark suite for data mining workloads. In IISWC, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  28. Y. Nishitani, K. Negishi, H. Ohta, and E. Nunohiro. Implementation and evaluation of OpenMP for Hitachi SR8000. In ISHPC, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Rajwar and J. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In MICRO, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based programs. In ASPLOS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Rudolph and Z. Segall. Dynamic decentralized cache schemes for MIMD parallel processors. In ISCA, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. C. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. A comprehensive scheduler for asymmetric multicore systems. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. The Standard Performance Evaluation Corporation. Welcome to SPEC. http://www.specbench.org/.Google ScholarGoogle Scholar
  34. M. A. Suleman. An Asymmetric Multi-core Architecture for Efficiently Accelerating Critical Paths in Multithreaded Programs. PhD thesis, UT-Austin, 2010.Google ScholarGoogle Scholar
  35. M. A. Suleman, O. Mutlu, J. A. Joao, Khubaib, and Y. N. Patt. Data marshaling for multi-core architectures. ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. Feedback-directed pipeline parallelism. In PACT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tornado Web Server. Source code. http://tornado.sourceforge.net/, 2008.Google ScholarGoogle Scholar
  39. M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed IP routing lookups. In SIGCOMM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P. Zhao and J. N. Amaral. Ablego: a function outlining and partial inlining framework. Softw. Pract. Exper., 37(5):465--491, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bottleneck identification and scheduling in multithreaded applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
      March 2012
      476 pages
      ISBN:9781450307598
      DOI:10.1145/2150976
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 40, Issue 1
        ASPLOS '12
        March 2012
        453 pages
        ISSN:0163-5964
        DOI:10.1145/2189750
        Issue’s Table of Contents
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 47, Issue 4
        ASPLOS '12
        April 2012
        453 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2248487
        Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 March 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate535of2,713submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader