skip to main content
10.1145/2854038acmconferencesBook PagePublication PagescgoConference Proceedingsconference-collections
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization
ACM2016 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
CGO '16: 14th Annual IEEE/ACM International Symposium on Code Generation and Optimization Barcelona Spain March 12 - 18, 2016
ISBN:
978-1-4503-3778-6
Published:
29 February 2016
Sponsors:
In-Cooperation:
IEEE-CS

Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
SESSION: Profiling Feedback
research-article
Public Access
Cheetah: detecting false sharing efficiently and effectively

False sharing is a notorious performance problem that may occur in multithreaded programs when they are running on ubiquitous multicore hardware. It can dramatically degrade the performance by up to an order of magnitude, significantly hurting the ...

research-article
Open Access
AutoFDO: automatic feedback-directed optimization for warehouse-scale applications

AutoFDO is a system to simplify real-world deployment of feedback-directed optimization (FDO). The system works by sampling hardware performance monitors on production machines and using those profiles in to guide optimization. Profile data is stale by ...

research-article
Portable performance on asymmetric multicore processors

Static and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load ...

SESSION: Data Layout and Vectorization
research-article
StructSlim: a lightweight profiler to guide structure splitting

Memory access latency continues to be a dominant bottleneck in a large class of applications on modern architectures. To optimize memory performance, it is important to utilize the locality in the memory hierarchy. Structure splitting can significantly ...

research-article
Public Access
Exploiting recent SIMD architectural advances for irregular applications

A broad class of applications involve indirect or datadependent memory accesses and are referred to as irregular applications. Recent developments in SIMD architectures – specifically, the emergence of wider SIMD lanes, combination of SIMD parallelism ...

research-article
Best Paper
Best Paper
Exploiting mixed SIMD parallelism by reducing data reorganization overhead

Existing loop vectorization techniques can exploit either intra- or inter-iteration SIMD parallelism alone in a code region if one part of the region vectorized for one type of parallelism has data dependences (called mixed-parallelism-inhibiting ...

SESSION: GPU
research-article
A black-box approach to energy-aware scheduling on integrated CPU-GPU systems

Energy efficiency is now a top design goal for all computing systems, from fitness trackers and tablets, where it affects battery life, to cloud computing centers, where it directly impacts operational cost, maintainability, and environmental impact. ...

research-article
Portable and transparent software managed scheduling on accelerators for fair resource sharing

Accelerators, such as Graphic Processing Units (GPUs), are popular components of modern parallel systems. Their energy-efficient performance make them attractive components for modern data center nodes. However, they lack control for fair resource ...

research-article
Communication-aware mapping of stream graphs for multi-GPU platforms

Stream graphs can provide a natural way to represent many applications in multimedia and DSP domains. Though the exposed parallelism of stream graphs makes it relatively easy to map them to GP (General Purpose)-GPUs, very large stream graphs as well as ...

research-article
Open Access
gpucc: an open-source GPGPU compiler

Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA’s CUDA and the cross-platform OpenCL standard. Until now, there has ...

SESSION: Affine Programs
research-article
A basic linear algebra compiler for structured matrices

Many problems in science and engineering are in practice modeled and solved through matrix computations. Often, the matrices involved have structure such as symmetric or triangular, which reduces the operations count needed to perform the computation. ...

research-article
Opening polyhedral compiler's black box

While compilers offer a fair trade-off between productivity and executable performance in single-threaded execution, their optimizations remain fragile when addressing compute-intensive code for parallel architectures with deep memory hierarchies. ...

research-article
Public Access
Trace-based affine reconstruction of codes

Complete comprehension of loop codes is desirable for a variety of program optimizations. Compilers perform static code analyses and transformations, such as loop tiling or memory partitioning, by constructing and manipulating formal representations of ...

SESSION: Static Analysis
research-article
Inference of peak density of indirect branches to detect ROP attacks

A program subject to a Return-Oriented Programming (ROP) attack usually presents an execution trace with a high frequency of indirect branches. From this observation, several researchers have proposed to monitor the density of these instructions to ...

research-article
Sparse flow-sensitive pointer analysis for multithreaded programs

For C programs, flow-sensitivity is important to enable pointer analysis to achieve highly usable precision. Despite significant recent advances in scaling flow-sensitive pointer analysis sparsely for sequential C programs, relatively little progress ...

research-article
Symbolic range analysis of pointers

Alias analysis is one of the most fundamental techniques that compilers use to optimize languages with pointers. However, in spite of all the attention that this topic has received, the current state-of-the-art approaches inside compilers still face ...

SESSION: Programming Models
research-article
Towards automatic significance analysis for approximate computing

Several applications may trade-off output quality for energy efficiency by computing only an approximation of their output. Current approaches to software-based approximate computing often require the programmer to specify parts of the code or data ...

research-article
Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns

High performance in modern computing platforms requires programs to be parallel, distributed, and run on heterogeneous hardware. However programming such architectures is extremely difficult due to the need to implement the application using multiple ...

research-article
Public Access
NRG-loops: adjusting power from within applications

NRG-Loops are source-level abstractions that allow an application to dynamically manage its power and energy through adjustments to functionality, performance, and accuracy. The adjustments, which come in the form of truncated, adapted, or perforated ...

SESSION: Correctness
research-article
Validating optimizations of concurrent C/C++ programs

We present a validator for checking the correctness of LLVM compiler optimizations on C11 programs as far as concurrency is concerned. Our validator checks that optimizations do not change memory accesses in ways disallowed by the C11 and/or LLVM ...

research-article
IPAS: intelligent protection against silent output corruption in scientific applications

This paper presents IPAS, an instruction duplication technique that protects scientific applications from silent data corruption (SDC) in their output. The motivation for IPAS is that, due to natural error masking, only a subset of SDC errors actually ...

research-article
Atomicity violation checker for task parallel programs

Task based programming models (e.g., Cilk, Intel TBB, X10, Java Fork-Join tasks) simplify multicore programming in contrast to programming with threads. In a task based model, the programmer specifies parallel tasks and the runtime maps these tasks to ...

SESSION: Binary/Virtualization
research-article
Flexible on-stack replacement in LLVM

On-Stack Replacement (OSR) is a technique for dynamically transferring execution between different versions of a function at run time. OSR is typically used in virtual machines to interrupt a long-running function and recompile it at a higher ...

research-article
Public Access
BlackBox: lightweight security monitoring for COTS binaries

After a software system is compromised, it can be difficult to understand what vulnerabilities attackers exploited. Any information residing on that machine cannot be trusted as attackers may have tampered with it to cover their tracks. Moreover, even ...

research-article
Re-constructing high-level information for language-specific binary re-optimization

In this paper, we show a binary optimizer can achieve competitive performance relative to a state-of-the-art source code compiler by re-constructing high-level information (HLI) from binaries. Recent advances in compiler technologies have resulted in a ...

Contributors
  • The University of Edinburgh
  • INRIA Institut National de Recherche en Informatique et en Automatique

Index Terms

  1. Proceedings of the 2016 International Symposium on Code Generation and Optimization

    Recommendations

    Acceptance Rates

    CGO '16 Paper Acceptance Rate25of108submissions,23%Overall Acceptance Rate312of1,061submissions,29%
    YearSubmittedAcceptedRate
    CGO '171162622%
    CGO '161082523%
    CGO '15882427%
    CGO '141002929%
    CGO '12902629%
    CGO '111052827%
    CGO '09702637%
    CGO '08662132%
    CGO '07842732%
    CGO '06802936%
    CGO '05752635%
    CGO '04792532%
    Overall1,06131229%