research-article

Accelerating simulation of agent-based models on heterogeneous architectures

Authors:
Jin Wang

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
Norman Rubin

Advanced Micro Devices

Advanced Micro Devices
View Profile

,
Haicheng Wu

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
Sudhakar Yalamanchili

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing UnitsMarch 2013Pages 108–119https://doi.org/10.1145/2458523.2458534

Published:16 March 2013Publication History

GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units

Pages 108–119

ABSTRACT

The wide usage of GPGPU programming models and compiler techniques enables the optimization of data-parallel programs on commodity GPUs. However, mapping GPGPU applications running on discrete parts to emerging integrated heterogeneous architectures such as the AMD Fusion APU and Intel Sandy/Ivy bridge with the CPU and the GPU on the same die has not been well studied.

Classic time-step simulation applications represented by agent-based models have the intrinsic parallel structure that is a good fit for GPGPU architectures. However, when mapping these applications directly to the integrated GPUs, the performance may degrade due to less computation units and lower clock speed.

This paper proposes an optimization to the GPGPU implementation of the agent-based model and illustrates it in the traffic simulation example. The optimization adapts the algorithm by moving part of the workload to the CPU to leverage the integrated architecture and the on-chip memory bus which is faster than the PCIe bus that connects the discrete GPU and the host. The experiments on discrete AMD Radeon GPU and AMD Fusion APU demonstrate that the optimization can achieve 1.08--2.71x performance speedup on the integrated architecture over the discrete platform.

References

B. Aaby, K. Perumalla, and S. Seal. Efficient simulation of agent-based models on multi-gpu and multi-core clusters. In Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, page 29. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2010. Google ScholarDigital Library
AMD. Amd fusion family of apus: Enabling a superior, immersive pc experience. March 2010.Google Scholar
AMD. Amd accelerated parallel processing opencl programming guide. December 2012.Google Scholar
AnandTech. Amd's graphics core next preview: Architected for compute. 2011.Google Scholar
M. Billeter, O. Olsson, and U. Assarsson. Efficient stream compaction on wide simd many-core architectures. In Proceedings of the Conference on High Performance Graphics 2009, HPG '09, pages 159--166. ACM, 2009. Google ScholarDigital Library
M. Daga, A. Aji, and W. Feng. On the efficacy of a fused cpu+ gpu processor (or apu) for parallel computing. In Symposium on Application Accelerators in High-Performance Computing (SAAHPC), pages 141--149. IEEE, 2011. Google ScholarDigital Library
M. Doerksen, S. Solomon, and P. Thulasiraman. Designing apu oriented scientific computing applications in opencl. In International Conference on High Performance Computing and Communications (HPCC), pages 587--592. IEEE, 2011. Google ScholarDigital Library
U. Erra, B. Frola, V. Scarano, and I. Couzin. An efficient gpu implementation for large scale individual-based simulation of collective behavior. In International Workshop on High Performance Computational Systems Biology, 2009. HIBI'09, pages 51--58. IEEE, 2009. Google ScholarDigital Library
N. Ferrando, M. Gosalvez, J. Cerda, R. Gadea, and K. Sato. Octree-based, gpu implementation of a continuous cellular automaton for the simulation of complex, evolving surfaces. Computer Physics Communications, 182(3):628--640, 2011.Google ScholarCross Ref
M. Garland, M. Kudlur, and Y. Zheng. Designing a unified programming model for heterogeneous machines. In Supercomputing, 2012. Google ScholarDigital Library
Intel. 2nd generation intel core i5 processor. 2011.Google Scholar
J. Katajainen, T. Pasanen, and J. Teuhola. Practical in-place mergesort. Nordic Journal of Computing, 3(1):27--40, 1996. Google ScholarDigital Library
S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco. Gpus and the future of parallel computing. Micro, IEEE, 31(5):7--17, September-October 2011. Google ScholarDigital Library
Khronos Group. The OpenCL Specification, version 1.2.19, November 2010.Google Scholar
S. Lakshmivarahan, S. Dhall, and L. Miller. Parallel sorting algorithms. Advances in Computers, 23:295--354, 1984.Google ScholarCross Ref
M. Lysenko and R. D'Souza. A framework for megascale agent based model simulations on graphics processing units. Journal of Artificial Societies and Social Simulation, 11(4):10, 2008.Google Scholar
M. Niazi and A. Hussain. Agent-based computing from multi-agent systems to agent-based models: a visual survey. Scientometrics, 89(2):479--499, 2011. Google ScholarDigital Library
NVIDIA. Nvidias next generation cuda compute architecture: Fermi. 2009.Google Scholar
H. Peters, O. Schulz-Hildebrandt, and N. Luttenberger. Fast in-place sorting with cuda based on bitonic sort. Parallel Processing and Applied Mathematics, pages 403--410, 2010. Google ScholarDigital Library
M. Pharr and R. Fernando. Gpu gems 2: Programming techniques for high-performance graphics and general-purpose computation. 2005. Google ScholarDigital Library
P. Richmond and D. Romano. Agent based gpu, a real-time 3d simulation and interactive visualisation framework for massive agent based modelling on the gpu. In Proceedings of International Workshop on Super Visualisation (IWSV08), 2008.Google Scholar
P. Richmond, D. Walker, S. Coakley, and D. Romano. High performance cellular level agent-based simulation with flame for the gpu. Briefings in bioinformatics, 11(3):334--347, 2010.Google ScholarCross Ref
M. Treiber and A. Kesting. An open-source microscopic traffic simulator. Intelligent Transportation Systems Magazine, 2(3):6--13, Fall 2010.Google ScholarCross Ref
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarDigital Library
H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45 '12, pages 107--118, 2012. Google ScholarDigital Library

Index Terms

Accelerating simulation of agent-based models on heterogeneous architectures
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
CF '12: Proceedings of the 9th conference on Computing Frontiers

With the rise of general purpose computing on graphics processing units (GPGPU), the influence from consumer markets can now be seen across the spectrum of computer architectures. In fact, many of the high-ranking Top500 HPC systems now include these ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Analyzing memory management methods on integrated CPU-GPU systems
ISMM '17

Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. On these systems, both the CPU and GPU share the same physical memory as opposed to using separate memory dies. Although integration eliminates the need to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
March 2013
156 pages
ISBN:9781450320177
DOI:10.1145/2458523
Editors:
John Cavazos
University of Delaware
,
Xiang Gong,
David Kaeli
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 March 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
APU
GPGPU
agent-based model
traffic simulation
Qualifiers
- research-article
Conference

Acceptance Rates
GPGPU-6 Paper Acceptance Rate15of37submissions,41%Overall Acceptance Rate57of129submissions,44%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 251
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerating simulation of agent-based models on heterogeneous architectures

GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units

ABSTRACT

References

Cited By

Index Terms

Recommendations

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Analyzing memory management methods on integrated CPU-GPU systems