Skip to main content

2015 | Buch

FPGA Based Accelerators for Financial Applications

insite
SUCHEN

Über dieses Buch

This book covers the latest approaches and results from reconfigurable computing architectures employed in the finance domain. So-called field-programmable gate arrays (FPGAs) have already shown to outperform standard CPU- and GPU-based computing architectures by far, saving up to 99% of energy depending on the compute tasks. Renowned authors from financial mathematics, computer architecture and finance business introduce the readers into today’s challenges in finance IT, illustrate the most advanced approaches and use cases and present currently known methodologies for integrating FPGAs in finance systems together with latest results. The complete algorithm-to-hardware flow is covered holistically, so this book serves as a hands-on guide for IT managers, researchers and quants/programmers who think about integrating FPGAs into their current IT systems.

Inhaltsverzeichnis

Frontmatter
Chapter 1. 10 Computational Challenges in Finance
Abstract
With the growing use of both highly developed mathematical models and complicated derivative products at financial markets, the demand for high computational power and its efficient use via fast algorithms and sophisticated hard- and software concepts became a hot topic in mathematics and computer science. The combination of the necessity to use numerical methods such as Monte Carlo simulation, of the demand for a high accuracy of the resulting prices and risk measures, of online availability of prices, and the need for repeatedly performing those calculations for different input parameters as a kind of sensitivity analysis emphasizes this even more. In this survey, we describe the mathematical background of some of the most challenging computational tasks in financial mathematics. Among the examples are the pricing of exotic options by Monte Carlo methods, the calibration problem to obtain the input parameters for financial market models, and various risk management and measurement tasks.
Sascha Desmettre, Ralf Korn
Chapter 2. From Model to Application: Calibration to Market Data
Abstract
We present the procedure of model calibration within the scope of financial applications. We discuss several models that are used to describe the movement of financial underlyings and state closed or semi-closed pricing formulas for basic financial instruments. Furthermore, we explain how these are used in a general calibration procedure with the purpose to determine sensible model parameters. Finally, we gather typical numerical issues that often arise in the context of calibration and that have to be handled with care.
Tilman Sayer, Jörg Wenzel
Chapter 3. Comparative Study of Acceleration Platforms for Heston’s Stochastic Volatility Model
Abstract
We present a comparative insight of the performance of implementations of the Heston stochastic volatility model on different acceleration platforms. Our implementation of this model uses Quasi-random variates, using the Numerical Algorithms Group (NAG) random number library to reduce the simulation variance, as well as Leif Andersen’s Quadratic Exponential discretisation scheme. The implementation of the model was in Matlab, which was then ported for Graphics Processor Units (GPUs), and then Techila platforms. The Field Programmable Gate Array (FPGA) code was based on C++. The model was tested against a 2.3 GHz Intel Core i5 Central Processing Unit (CPU), a Techila grid server hosted on Microsoft’s Azure cloud, a GPU node hosted by Boston Ltd, and an FPGA node hosted by Maxeler Technologies Ltd. Temporal data was collected and compared against the CPU baseline, to provide quantifiable acceleration benefits for all the platforms.
Christos Delivorias
Chapter 4. Towards Automated Benchmarking and Evaluation of Heterogeneous Systems in Finance
Abstract
Benchmarking and fair evaluation of computing systems is a challenge for High Performance Computing (HPC) in general, and for financial systems in particular. The reason is that there is no optimal solution for a specific problem in most cases, but the most appropriate models, algorithms, and their implementations depend on the desired accuracy of the result or the input parameters, for instance. In addition, flexibility and development effort of those systems are important metrics for purchasers from the finance domain and thus need to be well-quantified. In this section we introduce a precise terminology for separating the problem, the employed model, and a solution that consists of a selected algorithm and its implementation. We show how the design space (the space of all possible solutions to a problem) can be systematically structured and explored. In order to evaluate and characterize systems independent of their underlying execution platforms, we illustrate the concept of application-level benchmarks and summarize the state-of-the-art for financial applications. In particular for heterogeneous and Field Programmable Gate Array (FPGA)-accelerated systems, we present a framework structure for automatically executing and evaluating such benchmarks. We describe the framework structure in detail and show how this generic concept can be integrated with existing computing systems. A generic implementation of this framework is freely available for download.
Christian De Schryver, Carolina Pereira Nogueira
Chapter 5. Is High Level Synthesis Ready for Business? An Option Pricing Case Study
Abstract
High-Level Synthesis (HLS) tools for Field Programmable Gate Arrays (FPGAs) have made considerable progress in recent years, and are now ready for deployment in an industrial setting. This claim is supported by a case study of the pricing of a benchmark of Black-Scholes (BS) and Heston model-based options using a Monte Carlo Simulations approach. Using a high-level synthesis (HLS) tool such as Xilinx’s Vivado HLS, Altera’s OpenCL SDK or Maxeler’s MaxCompiler, a functionally correct FPGA implementation can be developed from a high level description based upon the MapReduce programming model in a short time. This direct source code implementation is however unlikely to meet performance expectations, and so a series of optimisations can be applied to use the target FPGA’s resource more efficiently. When a combination of task and pipeline parallelism as well as C-slowing optimisations are considered for the problem in this case study, the Vivado HLS implementation is 9.5 times faster than a sequential CPU implementation, the Altera OpenCL 221 times faster and Maxeler 204 times, the sort of acceleration expected of custom architectures. Compared to the 31 times improvement shown by an optimised Multicore CPU implementation, the 60 times improvement by a GPU and 207 times by a Xeon Phi, these results suggest that HLS is indeed ready for business.
Gordon Inggs, Shane Fleming, David B. Thomas, Wayne Luk
Chapter 6. High-Bandwidth Low-Latency Interfacing with FPGA Accelerators Using PCI Express
Abstract
The need for high performance computing dictates constraints on the acceptable bandwidth of data transfer between processing units and the memory. Consequently it is crucial to build high performance, scalable, and energy efficient architectures capable of completing data transfer requests at satisfactory rates. Thanks to increased transfer rates obtained by exploiting high-speed serial data transfer links instead of traditional parallel ones, PCI Express provides a promising solution to the problem of connectivity for todays complex heterogeneous architectures. In this chapter, we first cover the principals of interfacing using PCI Express. To illustrate a practical situation, we select the Xilinx Zynq device and develop an example architecture which allows the x86 CPU cores of the host system, the ARM cores of the Zynq device, and the hardware accelerators directly realized on the FPGA fabric of the Zynq to share the available DRAM memory for efficient data sharing. We provide estimates on possible data transfer bandwidths in our architecture.
Mohammadsadegh Sadri, Christian De Schryver, Norbert Wehn
Chapter 7. Pricing High-Dimensional American Options on Hybrid CPU/FPGA Systems
Abstract
In today’s markets, high-speed and energy-efficient computations are mandatory in the financial and insurance industry. As American options are amongst the most frequently traded products in the derivatives market, it becomes essential to place the focus on their pricing process. Calculating the price of an American option in particular is a challenging task due to the freedom the holder is given in terms of exercise date and the involved trading strategy. A well known algorithm that solves this task is the Longstaff-Schwartz (LS) algorithm, which applies least-squares linear regression on simulated Monte Carlo (MC) paths. This work presents a novel way to price high-dimensional American options, coined Reverse LS, using techniques of the embedded community. The proposed architecture targets hybrid Central Processing Unit (CPU)/Field Programmable Gate Array (FPGA) systems, and it exploits the FPGA reconfiguration to deliver high-throughput. With a bit-true algorithmic transformation based on recomputation, it is possible to eliminate the memory bottleneck and access costs present in a straightforward implementation. The result is a pricing system that is 16× faster and 268× more energy-efficient than an optimized Intel CPU implementation.
Javier Alejandro Varela, Christian Brugger, Songyin Tang, Norbert Wehn, Ralf Korn
Chapter 8. Bringing Flexibility to FPGA Based Pricing Systems
Abstract
High-speed and energy-efficient computations are mandatory in the financial and insurance industry to survive in competition and meet the federal reporting requirements. While FPGA based systems have demonstrated to provide huge speedups, they are perceived to be much harder to adapt to new products. In this chapter we introduce HyPER, a novel methodology for designing Monte Carlo based pricing engines for hybrid CPU/FPGA systems. Following this approach, we derive a high-performance and flexible system for exotic option pricing in the state-of-the-art Heston market model. Exemplarily, we show how to find an efficient implementation for barrier option pricing on the Xilinx Zynq 7020 All Programmable SoC with HyPER. The constructed system is nearly two orders of magnitude faster than high-end Intel CPUs, while consuming the same power.
Christian Brugger, Christian De Schryver, Norbert Wehn
Chapter 9. Exploiting Mixed-Precision Arithmetics in a Multilevel Monte Carlo Approach on FPGAs
Abstract
Nowadays, high-speed computations are mandatory for financial and insurance institutes to survive in competition and to fulfill the regulatory reporting requirements that have just toughened over the last years. A majority of these computations are carried out on huge computing clusters, which are an ever increasing cost burden for the financial industry. There, state-of-the-art CPU and GPU architectures execute arithmetic operations with predefined precisions only, that may not meet the actual requirements for a specific application. Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs) have a huge potential to accelerate financial simulations while consuming only very low energy by exploiting dedicated precisions in optimal ways. In this work we present a novel methodology to speed up Multilevel Monte Carlo (MLMC) simulations on reconfigurable architectures. The idea is to aggressively lower the precisions for different parts of the algorithm without loosing any accuracy at the end. For this, we have developed a novel heuristic for selecting an appropriate precision at each stage of the simulation that can be executed with low costs at runtime. Further, we introduce a cost model for reconfigurable architectures and minimize the cost of our algorithm without changing the overall error. We consider the showcase of pricing Asian options in the Heston model. For this setup we improve one of the most advanced simulation methods by a factor of 3–9× on the same platform.
Steffen Omland, Mario Hefter, Klaus Ritter, Christian Brugger, Christian De Schryver, Norbert Wehn, Anton Kostiuk
Chapter 10. Accelerating Closed-Form Heston Pricers for Calibration
Abstract
Calibrating models against the markets is a crucial step to obtain meaningful results in the subsequent pricing processes. In general, calibration can be seen as a minimization problem that tries to fit modeled product prices to the observed ones on the market (compare Chap. 2 by Sayer and Wenzel). This means that during the calibration process the modeled prices need to be calculated many times, and therefore the run time of the product pricers have the highest impact on the overall calibration run time. Therefore, in general, only products are used for calibration for which closed-form mathematical pricing formulas are known.
While for the Heston model (semi) closed-form solutions exist for simple products, their evaluation involves complex functions and infinite integrals. So far these integrals can only be solved with time-consuming numerical methods. However, over the time, more and more theoretical and practical subtleties have been revealed for doing this and today a large number of possible approaches are known. Examples are different formulations of closed-formulas and various integration algorithms like quadrature or Fourier methods. Nevertheless, all options only work under specific conditions and depend on the Heston model parameters and the input setting.
In this chapter we present a methodology how to determine the most appropriate calibration method at run time. For a practical setup we study the available popular closed-form solutions and integration algorithms from literature. In total we compare 14 pricing methods, including adaptive quadrature and Fourier methods. For a target accuracy of 10−3 we show that static Gauss-Legendre are best on Central Processing Units (CPUs) for the unrestricted parameter set. Further we show that for restricted Carr-Madan formulation the methods are 3.6× faster. We also show that Fourier methods are even better when pricing at least 10 options with the same maturity but different strikes.
Gongda Liu, Christian Brugger, Christian De Schryver, Norbert Wehn
Chapter 11. Maxeler Data-Flow in Computational Finance
Abstract
Computational finance is an area that includes many algorithms in trading and analytics that are both computationally very complex and performance critical. As financial institutions intend to perform a steadily increasing number of computations and obtain the results as quickly as possible, computer systems are expected to satisfy these growing performance demands. However, recent years have brought the end of “free” processors speed-ups, and single-thread performance is no longer the driving force behind automatic performance gains enjoyed by the industry for many decades. Nowadays, high-performance computing systems have to increasingly rely on parallel programming models where the original application has to be modified to exploit many parallel cores. This requires considerable redesign efforts and yet, the desired performance improvements are not guaranteed. Some financial applications may also reach practical physical limits imposed by the space and power provisions available in the data centre. A solution to the above problems can be the use of custom accelerators implemented in reconfigurable hardware. Reconfigurable implementations can deliver both high computational throughput and low compute latency in addition to superior energy efficiency. However, porting applications for such devices requires a special skill set in hardware design, complicating their practical adoption. Maxeler Technologies offers conveniently programmable, high-performance computing systems and a software toolchain that exploit the sheer computational power of reconfigurable devices while abstracting the programming into a high-level data-flow model. Our vision is to empower domain experts with the necessary means to create highly customised, efficient hardware/software implementations for their specific applications. This approach enables vertical optimisations across the different layers of abstraction that are typically not exposed to an application designer. The final result is a productive application development process that often delivers speed-ups by orders of magnitudes over traditional CPU implementations.
Tobias Becker, Oskar Mencer, Stephen Weston, Georgi Gaydadjiev
Backmatter
Metadaten
Titel
FPGA Based Accelerators for Financial Applications
herausgegeben von
Christian De Schryver
Copyright-Jahr
2015
Electronic ISBN
978-3-319-15407-7
Print ISBN
978-3-319-15406-0
DOI
https://doi.org/10.1007/978-3-319-15407-7

Neuer Inhalt