The Par Lab started in 2008, based on an earlier technical report “The Berkeley View” on the parallel computing challenge. (K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 18 2006.) This talk gives an update on where we are two years in the Par Lab. We picked five applications to drive our research, and believe they collectively capture many of the important features of future client applications even if they themselves do not become the actual future “killer app”. The Personalized Medicine application focuses on detailed modeling of individual’s responses to treatments, representing the important health market. The Music application emphasizes real-time responsiveness to rich human input, with high-performance many-channel audio synthesis. The Speech application focuses on making speech input work well in the real-world noisy environments where mobile devices will be operated. The Content-Based Image Recognition (CBIR) application represents the growing practical use of machine vision. Finally, the Parallel Web Browser is currently perhaps the most important single application on client devices, as well as representative of many other interactive rich-document processing tasks.
Our first step in attacking the parallel programming challenge was to analyze a wide range of applications, including workloads from embedded computing, desktop computing, games, databases, machine learning, and scientific computing, as well as our five driving applications. We discovered a surprisingly compact set of recurring computational patterns, which we termed “motifs”. We have greatly expanded on this work, and now believe that any successful software architecture, parallel or serial, can be described as a hierarchy of patterns. We divide patterns into either computational patterns, which describe a computation to be performed, or structural patterns, which describe how computations are composed. The patterns have proven central to ourresearch effort, serving as both a common human vocabulary for multidisciplinary discussions spanning application developers to hardware architects, as well as an organizing structure for software development. Another organizing principle in our original proposal was to divide the software development stack into two layers: efficiency and productivity. Programmers working in the efficiency layer are generally experts in achieving high performance from the underlying hardware, but are not necessarily knowledgeable of any given application domain. Programmers working in the productivity layer are generally knowledgeable about an application domain, but are less concerned with hardware details. The patterns bridge these two layers. Efficiency programmers develop libraries and frameworks that efficiently implement the standard patterns, and productivity programmers can decompose an application into patterns and use high-level languages to compose corresponding libraries and frameworks to form applications.
To improve the quality and portability of efficiency-level libraries, we proposed to leverage our earlier work on autotuning. Autotuning is an automatic search-based optimization process whereby multiple variants of a routine are generated and empirically evaluated on the hardware platform. We have also included a major effort on parallel program correctness to help programmers test, verify, and debug their code. Different correctness techniques apply at the efficiency layer, where low-level data races and deadlocks are of concern, and at the productivity layer, where we wish to ensure semantic determinism and atomicity. Our whole pattern-based component approach to the software stack hinges on the ability to efficiently and flexibly compose software modules. We developed a low-level user-level scheduling substrate called “Lithe” to support efficient sharing of processing resources between arbitrary modules, even those written in different languages and to different programming models.
Our operating system and architecture research is devoted to supporting the software stack. The OS is based on space-time partitioning, which exports stable partitions of the machine resources with quality-of-service guarantees to an application, and two-level scheduling, which allows a user-level scheduler, such as Lithe, to perform detailed application-specific scheduling within a partition. Our architecture research focuses on techniques to support OS resource partitioning, performance counters to support application adaptivity, software-managed memory hierarchies to increase memory efficiency, and scalable coherence and synchronization mechanisms to lower parallel system overheads. To experiment with the behavior of our new software stack on our new OS and hardware mechanisms, we have developed an FPGA-based simulation environment, “RAMP Gold”. By running our full application and OS software environment on our fast architectural simulator, we can quickly iterate across levels in our system stack.