skip to main content
10.1145/1122971acmconferencesBook PagePublication PagesppoppConference Proceedingsconference-collections
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
ACM2006 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
PPoPP06: ACM SIGPLAN 2006 Symposium on Principles and Practice of Parallel Programming 2006 New York New York USA March 29 - 31, 2006
ISBN:
978-1-59593-189-4
Published:
29 March 2006
Sponsors:

Bibliometrics
Skip Abstract Section
Abstract

I welcome you all to New York City, to the 2006 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06). The conference is being held at Columbia University, which has graciously allowed the conference to use its facilities. In addition, we are excited to have the conference co-located with the 4th International Symposium on Code Generation and Optimization (CGO-4). We hope to leverage the synergies between the two conference themes.One important change is that, starting this year, PPoPP will be held annually. It is widely expected that the upcoming wide availability of multi-threaded and multi-core processors will drive major advances in parallel programming. The PPoPP Steering Committee and the Organizing Committee feel that PPoPP is a forum that is uniquely positioned to capture the exciting new ideas that will flourish in this area. A yearly conference will fulfill these expectations better.At the conference, I am looking forward to exciting discussions with my colleagues on cutting-edge research on parallel programming. In addition, I am looking forward to all the amenities that New York City provides. In particular, our Local Arrangements Co-Chair, Calin Cascaval, has organized a dinner and theater evening in the Theater District. This is something you will not want to miss.

Skip Table Of Content Section
Article
Parallel programming and code selection in fortress

As part of the DARPA program for High Productivity Computing Systems, the Programming Language Research Group at Sun Microsystems Laboratories is developing Fortress, a language intended to support large-scale scientific computation with the same level ...

SESSION: Communication
Article
Collective communication on architectures that support simultaneous communication over multiple links

Traditional collective communication algorithms are designed with the assumption that a node can communicate with only one other node at a time. On new parallel architectures such as the IBM Blue Gene/L, a node can communicate with multiple nodes ...

Article
Performance evaluation of adaptive MPI

Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports migratable ...

Article
Mobile MPI programs in computational grids

Utility computing is becoming a popular way of exploiting the potential of computational grids. In utility computing, users are provided with computational power in a transparent manner similar to the way in which electrical utilities supply power to ...

Article
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

Message Passing Interface (MPI) is a popular parallel programming model for scientific applications. Most high-performance MPI implementations use Rendezvous Protocol for efficient transfer of large messages. This protocol can be designed using either ...

SESSION: Languages
Article
Global-view abstractions for user-defined reductions and scans

Since APL, reductions and scans have been recognized as powerful programming concepts. Abstracting an accumulation loop (reduction) and an update loop (scan), the concepts have efficient parallel implementations based on the parallel prefix algorithm. ...

Article
Programming for parallelism and locality with hierarchically tiled arrays

Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or ...

Article
Parallel programming in modern web search engines

When a Search Engine responds to your query, thousands of machines from around the world have cooperated to produce your result. With a global reach of hundreds-of-millions of users, Search Engines are arguably the most commonly used massively-parallel ...

SESSION: Performance characterization
Article
Performance characterization of molecular dynamics techniques for biomolecular simulations

Large-scale simulations and computational modeling using molecular dynamics (MD) continues to make significant impacts in the field of biology. It is well known that simulations of biological events at native time and length scales requires computing ...

Article
On-line automated performance diagnosis on thousands of processes

Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data ...

Article
A case study in top-down performance estimation for a large-scale parallel application

This work presents a general methodology for estimating the performance of an HPC workload when running on a future hardware architecture. Further, it demonstrates the methodology by estimating the performance of a significant scientific application -- ...

SESSION: Shared memory parallelism
Article
Hardware profile-guided automatic page placement for ccNUMA systems

Cache coherent non-uniform memory architectures (ccNUMA) constitute an important class of high-performance computing plat-forms. Contemporary ccNUMA systems, such as the SGI Altix, have a large number of nodes, where each node consists of a small number ...

Article
Adaptive scheduling with parallelism feedback

Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted ...

Article
Predicting bounds on queuing delay for batch-scheduled parallel machines

Most space-sharing parallel computers presently operated by high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources have accounts at multiple sites and ...

Article
Optimizing irregular shared-memory applications for distributed-memory systems

In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this ...

SESSION: Atomicity issues
Article
Proving correctness of highly-concurrent linearisable objects

We study a family of implementations for linked lists using fine-grain synchronisation. This approach enables greater concurrency, but correctness is a greater challenge than for classical, coarse-grain synchronisation. Our examples are demonstrative of ...

Article
Accurate and efficient runtime detection of atomicity errors in concurrent programs

Atomicity is an important correctness condition for concurrent systems. Informally, atomicity is the property that every concurrent execution of a set of transactions is equivalent to some serial execution of the same transactions. In multi-threaded ...

Article
Scalable synchronous queues

We present two new nonblocking and contention-free implementations of synchronous queues ,concurrent transfer channels in which producers wait for consumers just as consumers wait for producers. Our implementations extend our previous work in dual ...

PANEL SESSION: Software issues for multicore systems
SESSION: Multicore software
Article
POSH: a TLS compiler that exploits program structure

As multi-core architectures with Thread-Level Speculation (TLS) are becoming better understood, it is important to focus on TLS compilation. TLS compilers are interesting in that, while they do not need to fully prove the independence of concurrent ...

Article
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further exacerbates the situation because its search space is quadrupled. We propose ...

Article
"MAMA!": a memory allocator for multithreaded architectures

While the high-performance computing world is dominated by distributed memory computer systems, applications that require random access into large shared data structures continue to motivate development of ever larger shared-memory parallel computers ...

SESSION: Transactional memory
Article
McRT-STM: a high performance software transactional memory system for a multi-core runtime

Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed this concurrency using locks (mutex based synchronization). Unfortunately, ...

Article
Exploiting distributed version concurrency in a transactional memory cluster

We investigate a transactional memory runtime system providing scaling and strong consistency, i.e., 1-copy serializability on commodity clusters for both distributed scientific applications and database applications. We introduce a novel page-level ...

Article
Hybrid transactional memory

High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they ...

SESSION: Potpourri
Article
Fast and transparent recovery for continuous availability of cluster-based servers

Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, one of the main challenges in achieving continuous operation is to provide ...

Article
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster

Recently, the high-performance computing community has realized that power is a performance-limiting factor. One reason for this is that supercomputing centers have limited power capacity and machines are starting to hit that limit. In addition, the ...

Article
Teaching parallel computing to science faculty: best practices and common pitfalls

In 2002, we first brought High Performance Computing (HPC) methods to the college classroom as a way to enrich Computational Science education. Through the years, we have continued to facilitate college faculty in science, technology, engineering, and ...

    Contributors
    • University of Illinois Urbana-Champaign

    Recommendations

    Acceptance Rates

    Overall Acceptance Rate230of1,014submissions,23%
    YearSubmittedAcceptedRate
    PPoPP '211503121%
    PPoPP '201212823%
    PPoPP '191522919%
    PPoPP '171322922%
    PPoPP '141842815%
    PPoPP '07652234%
    PPoPP '03452044%
    PPoPP '99791722%
    PPOPP '97862630%
    Overall1,01423023%