Finds documents with both search terms in any word order, permitting "n" words as a maximum distance between them. Best choose between 15 and 30 (e.g. NEAR(recruit, professionals, 20)).
Finds documents with the search term in word versions or composites. The asterisk * marks whether you wish them BEFORE, BEHIND, or BEFORE and BEHIND the search term (e.g. lightweight*, *lightweight, *lightweight*).
This chapter delves into the evolution of Market Generator models, highlighting their emergence alongside deep learning advancements over the past decade. It addresses the unique challenges of generating financial data compared to other synthetic data types and compares these models to classical stochastic modeling techniques. The chapter also explores the relevance of path-wise methods, such as rough path signatures, in generative modeling. It discusses the relationships and differences among various generative models proposed as Market Generators and contrasts finance-specific objectives with general AI trends. The text emphasizes the importance of evaluating the quality of generated data, noting the lack of a universally accepted framework for this task. It showcases how signature methods can effectively work with time series, data streams, and path-valued distributions, offering a future-proof approach for generative AI. The chapter also covers the historical development of neural networks and generative models, the rise of deep generative models, and the impact of transformers and large language models. It discusses the differences between traditional applications of generative models and their use in financial data generation, highlighting the unique challenges and requirements of the financial domain. The chapter concludes by exploring the potential for a paradigm shift towards generative models in financial modeling and the future directions of generative AI.
AI Generated
This summary of the content was generated with the help of AI.
Abstract
Market Generators are a rapidly evolving class of neural-network-based models to simulate financial market behavior, offering a powerful alternative to classical stochastic models. These deep learning models are trained to encode the underlying distribution of financial data and generate new synthetic market scenarios from the learned distribution. Though the expressions “Market Generator” and its related form “Market Simulator” have only entered the vocabulary of financial modeling around 2019, by today, the modeling techniques related to them have already grown into an area in their own right. This growing interest is matched by a dramatic rise in research, publications and an accelerating rate of innovation in the ambient technological arena of generative modeling. Recently, this trend has culminated in the emergence of large generative networks, particularly GPT-type language models, which are proving to be driving one of the biggest disruptions in the history of technology.
1 Introduction
This chapter is dedicated to providing a brief synopsis of the evolution of Market Generator models, embedding their emergence against the backdrop of the development of deep learning over the past decade. The chapter outlines the distinct challenges involved in generating various types of financial data, and the aspects which set this task apart from other forms of synthetic data generation.
It addresses how these models compare to existing techniques in finance including classical stochastic modeling and numerical techniques, and highlights the relevance of path-wise methods (such as rough path signatures) when taking the generative route. It also aims to unravel relationships (common characteristics and differences) in generative models proposed as Market Generators to date, to the most recent technological advances in Generative AI including the recently emerged foundation models. The main value of this lies in contrasting finance-specific objectives with general AI trends, illustrating barriers and bottlenecks for applying the off the shelf AI solutions directly in financial contexts.
Advertisement
A critical challenge in generative modeling lies in evaluating the quality of generated data, as no universally accepted framework exists for assessing and validating outputs produced by trained models, especially in financial settings which present distinctive challenges. Metrics to evaluate the success of generative models in financial tasks are often the centerpiece of the success of the Market Generator as well as associated downstream tasks.
In this piece we show how signature methods can provide strikingly effective tools to work with time series, data-streams and path-valued distributions in an efficient and versatile way, that is future-proof for the directions where generative AI is headed. While the field is expected to advance rapidly at the time of writing, these distinctions between broader GenAI trends and finance-specific requirements, rooted in insights from decades of quantitative analysis and risk management, are likely to endure.
1.1 Generative Models, Generative AI, Market Generators, and Related (Recent) Developments
The concept of Generative AI (GenAI) has rapidly evolved, especially since the successes of large language models (LLMs) like GPT 3.5 and 4. Although these models have different architectures, they share common features. This section provides a brief overview of the main concepts and terminologies related to generative modeling and distinguishes some concepts and methods which are already adopted in finance, and the ones currently emerging in the field.
> Generative AI
Generative AI refers to a type of artificial intelligence that can create (generate) new data, such as text, images, videos, music, or code, resembling the patterns and structure of the data it was trained on, typically in response to prompts.
The key elements are:
1.
Generation of novel outputs which exhibit similar characteristics to the training data.
2.
Reliance on generative models to learn underlying patterns of the training data.
3.
Conditioning on prompts or inputs (text or sketches) to guide the generation process.
Examples of generative AI applications include (but are not limited to):
Image generation: Creating realistic images of people, objects, or scenes.
Video synthesis: Creating videos of people or objects performing actions.
Text generation: Producing articles, stories, poems, or code snippets.
Music composition: Generating original pieces of music in different styles.
Drug discovery: Generating new molecules with desired properties.
Recent breakthroughs in the field have the potential to drastically change the way we approach content creation.
Advertisement
> Foundation Models
The related term foundation model was coined by researchers1 in August 2021 to mean any model that is trained on broad data that can be adapted (e.g., fine-tuned) to a wide range of “downstream tasks”. The term is often used to describe generative AI models capable of performing a wide variety of general tasks (such as generating text and images) as well as specific downstream tasks within these areas.
Generative AI relies on Generative Models for the generation of new data.
> Generative Models
Generative Models are deep learning models that can be trained to capture the underlying patterns and structure of input data in order to generate new, similar data. There are many different approaches (EBMs, VAEs, GANs, normalizing flows, diffusion models, see below) for learning how to generate synthetic data statistically resembling key characteristics of the training data. These models are capable of creating new data, such as images, text, music, or other formats of content.
Deepfakes2 (a portmanteau of “deep learning” and “fake”) are synthetic media productions which have been digitally created or manipulated to convincingly replace one person’s likeness with that of another. The term has been expanded to cover any videos, pictures, or audio made with AI to appear deceptively real, by fabricating observably realistic content of individuals who (1) do not exist or (2) did not participate in the captured content.
Generative Modeling Avenues. The Main Approaches to Generative Modeling
There are several different types of generative models, each with its own approach to generating data:3
1.
Energy-Based Models (EBMs) follow the idea of parameterizing the unnormalized negative log-likelihood (“Energy”) with a neural network, \(p_{\theta }(x) = \frac {1}{Z_{\theta }} e^{-E_{\theta }(x)}\). The model can be trained by maximizing the likelihood of the given samples. The difficulty lies in approximating the normalization factor (“partition function”) \(Z_{\theta } = \int e^{-E_{\theta }(x)}dx\). Alternatively, the network can be trained by “score-matching”, which amounts to matching the score \(\nabla _{x}\log p_{\theta }(x)\) to the score of the empirical data distribution. One advantage of EBMs is the access to the likelihood function, however the drawback is that sampling from the model (e.g. via Langevin sampling or other likelihood-based sampling methods) is computationally expensive.
2.
Generative Adversarial Networks (GANs) consist of two neural networks, a generator and a discriminator, that compete against each other. The generator tries to create artificial data that looks real, while the discriminator tries to distinguish between real and generated data. This competition drives both networks to improve until the generator can produce convincingly realistic data.
3.
Neural Stochastic Differential Equations (Neural SDEs) are continuous-time generative models for sequential data. The mean and volatility parameter of the SDE which determines the probability distribution of the generated time series are parametrized by a neural network.
4.
Variational Autoencoders (VAEs) are a type of generative model that use an encoder to compress input data into a lower-dimensional latent space and a decoder to reconstruct the original data from the latent space. The latent space captures the underlying structure of the data, allowing the decoder to generate new data by sampling from that space.
5.
Deep Autoregressive Models4 These models generate discrete data sequentially, one element at a time, based on the previous elements. The discrete probability distribution is typically defined by applying the softmax function to the output layer of the neural network. They are commonly used for tasks such as text generation (e.g. in LLMs), where the next word in a sentence is predicted based on the previous words. Another use case is generating pictures, where the probability distribution of a pixel (whose value is discrete) depends on the values of close-by pixels.
6.
Flow-Based Models (Normalizing Flows) learn a series of invertible transformations that can transform simple probability distributions into complex ones. They are often used for tasks where understanding the transformation of data is crucial. There are many choices for parameterizing the invertible transformations with neural networks. A big advantage is that one has access to both the likelihood and an efficient way to generate samples.
7.
Diffusion models are a class of generative models that (like the continuous-time version of a Flow-based model) learn to iteratively refine noise into samples from the target probability distribution. This so-called reverse diffusion process (reversing the forward diffusion process which turns the data distribution into noise) is a neural SDE, whose parameters are learned by score-matching for different noise levels. When combined with LLMs, these have achieved state-of-the-art results for picture generation based on text input.
> Market Generator, Market Simulator
The term Market Generator5 refers to numerical techniques that rely on generative models for the purpose of synthetic market data generation. In essence, Market Generators are designed to capture and replicate the probability distribution of the input data via a generative model and to produce new data samples, whose distribution aligns with the captured distribution.
The term Market Simulator was initially used interchangeably with the term Market Generator. More recently, it is increasingly used to refer to models that generate market data by simulating interactions between different market participants.
1.2 Historical Background: Development of Neural Networks and Generative Models
Historical Development: The Rise of Deep Generative Models
The Resurgence of Deep Learning: Beginning in the late 2000s, the resurgence of deep learning drove progress and research in image classification, speech recognition, natural language processing and other tasks. Neural networks in this era were typically trained as discriminative models, due to the challenges of generative modeling.
VAEs and GANs: Introduced in 2013, VAEs were among the first deep-learning models to successfully generate realistic images and speech. According to Akash Srivastava, an expert on generative AI at the MIT-IBM Watson AI Lab, “VAEs opened the floodgates to deep generative modeling by making models easier to scale.”6 This milestone laid the foundation for today’s generative AI technologies. In 2014, GANs further advanced generative capabilities by introducing adversarial training, enabling the creation of highly realistic and complex image data. Together, VAEs and GANs marked a turning point in deep generative modeling, transitioning from simple discriminative outputs (e.g., class labels) to the generation of new images.
Transformers and LLMs: In 2017, Google introduced the Transformer architecture in the landmark paper “Attention Is All You Need”. It combines the encoder-decoder architecture with a text-processing mechanism called attention, revolutionizing language model training. OpenAI’s GPT (Generative Pre-trained Transformer) series built on this, with GPT-1 (2018) demonstrating pre-training, GPT-2 (2019) showcasing generalization, and GPT-3 (2020) solidifying foundation models. ChatGPT, launched in 2022, brought generative AI to the mainstream, making advanced language models widely accessible. For further developments and current challenges see the Appendix.
Diffusion Models and Generative AI for Art: The release of DALL-E in 2021 showcased the ability to generate art from natural language prompts using a VAE-based approach, but it had limitations in image quality and fidelity. In contrast, diffusion models, introduced around the same time, achieved outputs of much higher quality. These advancements powered DALL-E 2, which significantly surpassed its predecessor in generating detailed, realistic, and coherent images. Diffusion models quickly became the backbone for other AI art tools like Stable Diffusion and MidJourney, further revolutionizing AI-generated art and creative industries.
1.3 Where Quantitative Finance Goals Deviate from Traditional Applications of Generative Models
Generative models excel in tasks such as photo and language generation, but their application in financial data generation, particularly for quantitative finance, presents unique challenges. It may seem natural (and tempting) to directly apply off-the-shelf generative modeling techniques from other domains to these areas of finance, but there are some differences to be mindful of when doing so.
For example, there is a difference between permissible invariances in visual tasks like image or video generation and financial data generation. These invariances associated with the data are often reflected in the (optimal) architecture of the associated neural network performing the task. In visual domains, transformations such as scaling, rotating, mirroring, or translating an image generally preserve its essential identity—for example, an image of a dog remains identifiable. However, analogous operations in financial time series can drastically alter their structure. Consequently, off-the-shelf solutions developed for domains where these invariances are central, can be erroneous when transferred to financial settings without further attention.
Another difference is the relevance of the underlying distribution in the training data. This is reflected in the fact that Market Generators explicitly place an emphasis on the distribution of the input and output data, while in other generative applications this is often not needed: in many generative applications, the goal is to produce one realistic output at a time, e.g. one realistic image. The quality of each of the outputs can be evaluated individually, e.g. by visual inspection, or by the plausibility of the produced sequence or coherence of the text.
In financial modeling, however, the probabilistic composition of the training set is central, in fact it is typically the main target of modeling. Based on observed data, models are used to infer the underlying distribution of the training data and therefore, outputs of Market Generators are typically probabilistic. Multiple outputs will form an (empirical) distribution that can be evaluated based on its similarity/score or/distance to the (empirical or believed) underlying distribution of the training data. This wouldn’t be feasible with a single observation value: for instance, if the market generator is tasked with the objective to approximate a Gaussian distribution, it would be hard to tell—based on a single sample—whether the objective has been met successfully.
In many machine learning domains working with time series forecasting (TSF), there is a reflex to gravitate towards point estimates, but in financial settings there are a number of compelling reasons to target probabilistic outputs instead, see Sect. 2.1 below. Another point is that, while in physics and experimental sciences it is often possible to repeat the experiment multiple times and extract the underlying dynamics from multiple observations, in financial settings it is rarely possible to repeat the same exact experiment with real data, and this often results in worse signal-to-noise ratios. Since market data is prone to frequent regime shifts, performance on (ever-changing) real data is a volatile metric of the quality of modeling success.7
The aforementioned differences in the nature of the data and intended application domain are also reflected in differences in the appropriate choice of loss functions and quality evaluation metrics for the generative task: the evaluation of the quality of outputs is rarely as simple as visual inspection,8 see for example Fig. 1 in Sect. 3.1 below. Similarly, in regard to off-the-shelf loss functions, [8] lays out multiple examples where generative models trained under ‘traditional’ loss functions are not guaranteed to perform well in typical financial tasks such as portfolio construction. The construction of suitable loss functions and evaluation metrics are often the centerpiece of successful modeling with generative models in financial settings. Signature methods (as outlined in Sect. 3.1) provide a universal framework that allows us to bypass the limitations of working with time-series in a return-by-return fashion and to model data-streams as path-valued distributions instead, thereby unlocking versatile tools for modeling, high-fidelity features, and more efficient training.
Fig. 1
A display of target paths \( \mathbb {P}_{X^{\text{true}}}\) versus synthetic paths generated by a market generator \( \mathbb {P}_{X^{\theta }}\) with model parameters \(\theta \) shows the challenge of evaluating the model visually through samples
2 Synthetic Data Generation in Finance: Classical Models and Generative Models
Though the emergence of generative models is a development of this past decade and the research on Market Generators (as defined above) has only gained traction in recent years, there is a far longer history of numerical modeling and synthetic data generation in mathematical finance. The aim of simulating realistic (probable) future scenarios of market activity has a history that spans over almost five decades.9 Models that proxy for the dynamic evolution of asset prices have evolved to reflect more and more realistic properties (stylized facts) that are universally present in any data stream aiming to represent financial markets. These models can broadly be grouped into two classes:
1.
Continuous-time stochastic models: This started with the Black-Scholes model, modeling the asset price process via a Geometric Brownian motion. These models are used for the pricing and hedging of derivatives. With options becoming liquid assets, and exotic derivatives being created, much more sophisticated models were developed to capture more aspects of the market (such as including stochastic volatility). The (few) parameters of these models are calibrated to the current prices of the market (“\(\mathbb {Q}\)-world”).
2.
Discrete-time models for forecasting: These models are used to forecast the mean and/or volatility of the asset returns. Standard models include Linear Autoregressive models (AR-models), and models capturing heteroscedasticity (e.g. GARCH-models). The parameters of these morels are calibrated to historical time series of prices (“\(\mathbb {P}\)-world”).
2.1 Market Generators in Finance: Can Generative Models Add Value?
Financial time series simulation is a central topic since it bypasses rigidities imposed by parametric models and extends the limited real data for training and evaluation of DNN models. The fidelity and flexibility of the output is also crucial for enhancing data privacy and it has proven to add value for validating models, backtesting, and addressing issues related to data scarcity, e.g. in extreme scenarios.
Generators for Time Series Forecasting: Traditional time series models like ARIMA-GARCH have limitations when it comes to capturing non-linear relationships and adapting to changing market conditions. A probabilistic Market Generator offers a more flexible and powerful alternative, for time series generation in the \(\mathbb {P}\)-measure as demonstrated in [56, 61]. For example the probabilistic Market Generator in [56] allows for the simultaneous prediction of both the conditional mean and variance of time series data, which enables the generation of realistic synthetic financial market time series. For this purpose the objective in [56] is to create synthetic financial data that preserves key statistical properties of real market data while balancing performance and interpretability, making it suitable for regulated industries such as finance.
> Example: Time Series Modeling with Market GeneratorsFin-GAN
Fin-GAN [61] explores the use of GANs for probabilistic forecasting of financial time series. Traditionally, this task has been achieved using methods that rely solely on point estimates. Classical machine learning and time series approaches do not allow for probabilistic forecasts or have strong assumptions on the form of the future distribution of the target variable.
Fin-GAN is a customised conditional GAN, which utilises ForGAN [37] architecture and is trained via a PnL-based loss function. The input condition \(a_t\) in the Fin-GAN case are the previous L values of the returns time series \(X_t\), and the output is a return scenario for the next time step \(G(a_t, Z)\), given Gaussian noise Z. As forecasting is mainly performed for trading purposes, Fin-GAN uses the distribution of \(G(a_t, Z)\) to adjust the trade size based on the uncertainty in the direction of the move.
Fin-GAN is benchmarked against a classical deep learning method for time series forecasting, LSTM [29], and with a standard time series model ARIMA [59].
Empirical studies on equity returns demonstrate that Fin-GAN attains superior performance to the benchmarks.
Further Market Generators for time series forecasting include [24, 37, 56].
Forecasting the probability distribution of future scenarios, rather than just point estimates, is beneficial in low signal-to-noise environments even in scenarios where one was traditionally interested only in point estimates (the most-likely next-step). Schwarz takes this view a step further and argues in [56], that probabilistic outputs are more desirable. An example is synthetic data for market making, where probabilistic forecasts can enhance (both human and algorithmic) trading strategies, allowing for more profitable and risk-aware trading.
> Example: A Case for Probabilistic outputs
A market maker wants to place bids and ask quotes in a way that minimizes the probability of a loss while recycling acquired inventory. This requires both a liquidity model (predicting volume traded) and a model which forecasts the probability distribution of future returns. Trading strategies incorporating such probabilistic forecasts are superior to those relying on point estimates alone, since they allow traders to quantify confidence in their trades, determine optimal trade sizes, and improve risk management.
Generators for Stochastic Modeling
Similarly to the above, Market Generators grant added flexibility to fit more complex distributions, compared to traditional models for modeling with stochastic models. If one uses a (classical) parametric model to simulate (future) market scenarios, one first needs to settle on a model described by a set of (possibly quite restrictive) assumptions on the underlying distribution which is governed by a (pre-)fixed set of parameters. Then, when fitting a parametric model to the data, one selects those parameters that bring the model predictions closest to the observations in the target data. In data-driven, non-parametric modeling (such as generative models) the number and scale of model parameters is not pre-set but, the complexity and nature of the data at hand influences them for optimal fit.
An advantage of (non-parametric) generative modeling is that it can produce high-fidelity data, even in highly complex scenarios, without the need to make prior (restrictive) assumptions about the structure of the underlying distribution, contrary to parametric models.
Training
There are multiple ways of training Market Generators; one appealing option is to do this directly on the level of samples, by comparing the samples produced by the model and samples in the training data, computing a similarity score resulting from this comparison, and backpropagating that through the neural network. Indeed, in this way the training/fitting of the Market Generator can be done10without explicit knowledge of the underlying distribution, which permits more modeling freedom if desired, using sample-based similarity metrics (such as MMD) as objective functions and backpropagation.
A further valuable advantage of Generative Models is the versatility of possibilities for conditioning: One of the particularly appealing features of Generative models is that they (almost always) come with a conditonable counterpart: VAE \(\Rightarrow \) CVAE (Conditional Variational Auto Encoder), GAN \(\Rightarrow \) CGAN (Conditional Generative Adversarial Network) and so on. Conditioning enables us to generate new samples from a distribution, conditional on a few requirements or on context.11 While conditioning in parametric models was limited to the few parameters of the model, this feature of generative AI opens up a whole new range of possibilities for conditioning on more states on the world, which may be relevant to the future probability distribution of the considered asset.
Some Use Cases Where Such Flexibility Can Add Value
One of the first12 and primary directives in synthetic financial data generation at the onset of this line of research was the need for protecting the privacy of customers and entities involved with banks, while still making it possible to share such data outside of the organisations that generate it: especially for research, risk assessment and auditing purposes of new technologies. Financial data contains some of the most sensitive and personally identifiable attributes of customers. Due to such privacy concerns and regulatory restrictions, financial institutions are often limited in sharing real data both internally and externally. Synthetic data, which is generated to mimic real financial data while protecting privacy, is proposed as a solution.
Secondly, synthetic data can be used to train data-hungry Deep Learning models on a richer set of scenarios, than real data can provide. This can be a trading strategy with the goal of maximizing profits (finding alpha), in portfolio management [8], or a hedging objective (“deep hedging” [4]). In the latter, the idea is to use a Market Generator, and train a strategy on a rich but highly realistic set of synthetic price path scenarios, with the objective of minimizing variance or a risk measure of choice. This leads to improved hedging strategies in real world scenarios. In this context, [8] points out the relevance of the initial sample size of the training data and some pitfalls of generating more data than originally available.13
Finally, generative models can be seen as extensions of classical parametric models. They can learn and replicate complex and nonlinear dependencies better than classical parametric models. Synthetic data can be used to create highly synthetic scenarios which are useful for risk management, scenario analysis, stress testing, and back-testing of trading strategies. This is especially important when the original data is scarce. Originally, parametric models where used for this purpose.14 Generative models make it possible to make use of past datasets and condition them to more contemporary market conditions, or to include possible scenarios that are realistic, but may not have happened yet. They can faithfully replicate any parametric (target) model but they can also be used for high-dimensional (or otherwise highly complex) scenarios where parametric models are not available.
There are a number of examples where the data is highly complex in finance, where a more flexible model is beneficial. Most notably, these complexities come with unique challenges. Tackling these efficiently requires specialized architectures, and bespoke loss functions (objective functions, discriminators) during training. Indeed, performance evaluation metrics become a centerpiece of financial modeling with Market Generators, as we will see in the two examples below and in later sections. The complexities are dependent on the asset class considered, as each brings special challenges with it. For example, modeling dynamics of the implied volatility surface involves considering intricate no-arbitrage conditions.
> Example: High Dimensional Target Data of with Complex Dynamics Volatility Surface Modeling (Vol-GAN) [60]: Learning to Simulate SPX Implied Volatility Scenarios
VolGAN [60] of Cont and Vuletić is a generative model for arbitrage-free implied volatility surfaces. The implied volatility surface, which summarizes the cross-section of option prices across strikes and maturities, gives a snapshot of the state of the options market. Any model of implied volatility should appropriately capture the co-movements of implied volatilities across moneyness and time-to-maturity, reproduce the empirically observed dynamics of implied volatilities [15] and the underlying, be able to capture the smile, skew, and term structure, and satisfy arbitrage constraints [22, 25]. Given the high dimensionality of the volatility surface and the complexity of its dynamics, it is challenging to capture all these properties in a parametric model.
The model:VolGAN is a custom conditional GAN with a smoothness penalty [33, 54] incorporated into the loss function, and arbitrage penalty scenario re-weighting [16]. VolGAN receives as input
the implied volatility surface at the previous date,
the two previous underlying returns,
the realized volatility from the previous period,
and Gaussian noise. It outputs (joint) scenarios for
the return of the underlying asset and
the implied volatility surface
for the next period, along with a set of weights (probabilities) associated with these scenarios. The outputs of the generator described above are a-priori not guaranteed to satisfy the static arbitrage constraints: call prices should be decreasing and convex in strike and increasing in time to maturity. To correct for this, methodology from [16] is applied to re-weight the one-day-ahead scenarios generated by the GAN.
This example shows that a carefully crafted loss function is central for the success of the Market Generator, and it is also a central ingredient for a number of downstream tasks related to evaluating similarity of distributions in the target application space. Further Market Generators for volatility surfaces include, but are not limited to [9, 11, 13, 49, 63, 64].
The following paper addresses returns distributions, and again its success in efficiently capturing tail events is due to a well-chosen loss function. It replaces the classical loss functions used in GANs with a financially relevant one based on the joint elicitability of a risk measure couple, thereby forcing the model to better learn the tails of return distributions.
Example: Modeling Extreme and Rare Events:Tail-GAN[17]
Tail-GAN [17] (Cont, Cucuringu, Xu and Zhang) is a custom (unconditional) GAN designed for tail risk scenario simulations, with the aim of being used for estimating Value-at-Risk (VaR) and Expected Shortfall (ES) for both static and dynamic portfolios.15 Gneiting [26] and Weber [62] show ES is not elicitable,16 whereas VaR at level \(\alpha \in (0,1)\) is elicitable for random variables with a unique \(\alpha \)-quantile, however, the pair \((\mbox{VaR}_{\alpha }(\mu ),\mbox{ES}_{\alpha }(\mu ))\) is jointly elicitable. Tail-GAN uses the result from [23] which gives a family of score functions which are strictly consistent for \((\mbox{VaR}_{\alpha }(\mu ),\mbox{ES}_{\alpha }(\mu ))\). Therefore, TailGAN constructs a min-max game for the generator and the discriminator such that the equilibrium point is given precisely by Value-at-Risk and Expected Shortfall.
2.2 (Why) Is There a Paradigm Shift Towards Generative Models? Where Is Generative AI Headed?
While a complete paradigm shift might not have happened yet, the increasing adoption, technological advancements, and evolving regulatory landscape suggest that generative models are poised to play a more significant role in financial modeling in the future. The extent to which they transform the industry will depend on how well these challenges are addressed and how effectively generative models can demonstrate their value in real-world financial applications. Additionally, the recent emergence of foundation models and their convenient use in all applications, including finance (see Appendix for some examples), points in the direction that this shift in the near and foreseeable future will be continuing towards generative AI and sequence to sequence models.
? Where is Generative AI Headed?
Will generative models bebiggerorsmallerthan today?
Trend towards bigger models
Until recently, a dominant trend in generative AI has been growing scale, with larger models trained on ever-growing datasets. Scaling laws allow researchers to estimate how powerful a new, larger model (whether larger in size or trained on more data) will be, before investing in the tremendous computing resources it takes to train them.
There is a continued interest in the emergent capabilities arising when a model reaches a certain size. Examples include glimmers of logical reasoning. It’s not just the model’s architecture that causes these skills to emerge but its scale.
Trend towards smaller models:
More recent trends point toward smaller models trained on domain-specific data. We expect models to shrink in size, speeding up tuning and inference.
Goal: Reducing Carbon Footprint of Generative Models
Researchers are increasingly focused on developing energy-efficient methods to train, tune, and run AI models to lower costs and reduce AI’s enormous carbon footprint. Smaller models and specialization comes with advantages; a smaller model is vastly cheaper and less carbon-intensive to run.
Example: One encouraging finding is that effective AI models can be a lot smaller than they are today. DeepMind recently showed in [30] that a smaller model trained on more data could outperform a model four times larger trained on less data.17
Better performance with specialization:Smaller models trained on more domain-specific data can often outperform larger, general-purpose models.
“When you want specific advice, it may be better to ask a domain expert for help rather than trying to find the single smartest person you know” (David Cox, IBM).
Example: PubMedGPT 2.75B is a relatively small model, trained on biomedical abstracts. Researchers at Stanford found that it could answer medical questions significantly better than a generalist model the same size. Their work suggests that smaller, domain-specialized models may be the right choice when domain-specific performance is important.
We believe the future holds both generalist models with more capabilities, but also more specialization with smaller, more efficient models, via dynamically navigating the weight-composition of specialist models according to their relevance in the given setting. Two promising approaches are using mixture of experts as in DeepSeekMoE [21] or using filtering as in Filtered-not-Mixed [55].
2.3 A Paradigm Shift Towards Path-Wise Modeling, and Sequence-to-Sequence Models
The concept of generative AI has evolved, especially after the successes of LLMs like GPT 3.5 and 4. Though based on different Generative modeling cores and different architectures, these models share a common feature: they are probabilistic, autoregressive sequence models, where an efficient methodology to capture path-wise effects and relationships in the data-stream is poised to become increasingly important. Here, we give a quick outlook into two generative modeling examples where signature methods have demonstrated significant advantages, making the generation task more efficient. See more in Sect. 3.1.
Kidger et al. [34] describes the signature as an infinite graded sequence of statistics, which can uniquely characterize a stream of data up to a negligible equivalence class. As introduced in Chaps. “A Primer on the Signature Method in Machine Learning”/link?doi=10.1007/978-3-031-97239-3_0 and “An Introduction to Tensors for Path Signatures”, the signature transform can be seen as a feature transformation, on top of which a model may be built. A striking property of the signature transform is the efficiency with which it can capture the relevant information in a data stream. The paper Deep signature transforms proposes an approach which combines the advantages of the signature transform with modern deep learning frameworks for efficient generation of data streams, including financial time series.18 This idea has been further refined in the Rough Transformers paper [46], where the authors address the struggles of traditional recurrent models with real-world time series data, which often exhibit long-range dependencies and irregular sampling intervals. The Rough Transformer makes use of signature attention to capture multi-scale dependencies efficiently while reducing computational costs.
Another route to explaining this viewpoint’s efficiency in capturing time series is provided by neural controlled differential equations (CDEs), which can be seen as the continuous-time analogue of recurrent neural networks and offer a memory-efficient continuous time way to model functions of potentially irregular time series. The paper [35] demonstrates how to represent the input signal of streamed data over small time intervals through its log signature, which are statistics describing how the signal drives a CDE. This is the approach for solving rough differential equations (RDEs). This view offers significant training speed-ups, improvements in model performance, and reduced memory requirements compared to existing approaches.
In a related theme, [41] aproximates SDEs using RNNs with log signature features.
Trends from the very recent past indicate that Deep Structured state-space models (SSMs)19 deliver exceptional performance across application domains while offering reduced training and inference costs compared to attention-based transformers.
(SSMs) are increasingly recognized as effective approaches for modeling sequential data. Recent advances show that incorporating multiplicative interactions between inputs and hidden states into the linear recurrence mechanisms powering SSMs—exemplified by architectures like GateLoop, Mamba, and GLA—enables these models to outperform attention-based foundation models, both in accuracy and computational efficiency, even at billion-parameter scales. The work [12], explains the empirical success of selective SSMs such as Mamba but also establishes a rigorous basis for understanding the expressive potential of future SSM variants, providing theoretical foundation for these observations rooted in Rough Path Theory.
3 The Roles and Advantages of Signature Methods for Generative Modeling of Financial Data Streams
Signature-Based Modelling and Generation of Paths
Signature-based techniques recently emerged as leading machine learning technology for learning time series data. They offer a mathematically precise framework for capturing interactions within complex, evolving data streams. This precision lends itself to effective numerical methods for analyzing streamed data, particularly in contexts where the data is irregular [32, 43], non-stationary [5, 48], and of moderate dimensionality and sample size, see [42] for a comprehensive survey.
Below we highlight two compelling examples of models using the signature in the path generation process and remark that a full list would go hopelessly beyond the limitations of this chapter. In the next sections, we will look at how the signature can be used to discriminate between the generated paths.
Arribas et al. [1] proposes a data-driven model selection by integrating a classical quantitative setup with a generative modeling approach, leveraging the properties of the signature-transform. The framework provides a new perspective on SDEs and exotic financial products that depend, in a non-linear way, on the whole trajectory of asset prices. It further allows to simulate future possible market scenarios and consistently calibrate under the pricing measure \(\mathbb {Q}\) and real-world measure \(\mathbb {P}\).
Further papers in this direction include [19, 20, 32, 53], and many more.
> Example: Signature-Based Models: Theory and Calibration [18]
Cuchiero et al. [18] presents asset price models whose dynamics are described by linear functions of the (time-extended) signature of a primary underlying process.20 In this framework, available classical models can be approximated arbitrarily well and the model’s parameters can be learned from all sources of available data by simple methods. Results presented in [18] include conditions guaranteeing absence of arbitrage in the associated model, as well as tractable option pricing formulas for ‘sig-payoffs’.
For more on this topic, see [18] and Chap. “Signature-Based Models in Finance” of this volume.
3.1 Challenges and Pitfalls in Performance Evaluation of Market Generators
A critical challenge in generative modeling lies in evaluating the quality of generated data, as no universally accepted framework exists for assessing and validating outputs produced by trained models, cf. [8, 24]. Since (a) mathematical modeling of financial markets and associated simulation of market data has existed before the onset of market generation and since (b) ML researchers have used evaluation metrics for generative models outside of the domain of market generation, there are several potentially useful evaluation metrics available. But our task has special characteristics and (as we have established already) deviations from these domains, which may deem some of established evaluation metrics suboptimal, insufficient,21 or otherwise unsuitable for Market Generators. Correspondingly, in this section we highlight some of the historically available metrics and discuss how the aforementioned differences in goals and settings manifest with regard to performance evaluation metrics. We discuss challenges in modeling and performance evaluation (a) due to differences compared to other AI applications or (b) due to their differences compared to classical, (continuous-time) stochastic models discussed in previous sections, and some challenges that have to do (c) more inherently with the purpose22 of Market Generators to generate more volume of high-fidelity training date at instances where access to data is limited.
While it is straightforward to assess how realistic the output is in a model for photo generation, model evaluation is more complex in financial settings, since targets are distributional and path-valued. The evaluation of the quality of outputs will depend on the application of the model and will likely be investigated through tests tailored to the context.
We now break down challenges in each step of the generation23 pipeline:
i.
After observing a (limited) number of instances of the underlying probability distribution, the goal is to infer this distribution from the available observations.
ii.
During the training phase, the underlying distribution is reconstructed via approximation by a generative model from which it can generate (any number of) new samples.
iii.
Once the new samples have been generated it remains to verify that output is of good quality, e.g. that genuinely new data has been created and it is hard (impossible) to tell synthetic data apart from real data.
Challenges:
i.
To infer the underlying distribution in neural network training, for step (i.) we implicitly assume that there is one common underlying probability distribution that our target data is sampled from (\(\Rightarrow \)(a)). In a time series setting this translates to some form of temporal stability in the data. This amounts to the assumption of some form of stationarity,24 but in reality these assumptions are not necessarily fulfilled, or only for very limited periods.25 Financial time series are known to be highly non-stationary. As a result, inferring the underlying distribution (step i.) may suffer from the challenges of low signal-to-noise ratio, limited availability of data26 or having to infer the underlying distribution from a very limited number of samples (before significant distributional changes happen).
ii.
In order to train an appropriate model that can approximate an underlying distribution with high fidelity, one needs a flexible (data-driven) model (typically the richer the characteristics of the data, the more parameters needed) which is parsimonious enough that it can be trained reliably well on the limited data. Hence steps (i.) and (ii.) are in conflict with each other. While the limited data available may have been sufficient to calibrate the few parameters of classical models, this may be a challenge in the generative setting (\(\Rightarrow \)(b)) if the number of parameters becomes large. A careful choice of features is beneficial and can keep the number of parameters at bay. On a different note, it is desirable to be able to train models without explicitly specifying the underlying distribution.27
iii.
For performance evaluation, established targets from classical models may not carry over to all generative modeling settings or may become only necessary but not sufficient characteristics in the more flexible setting of Market Generators:
Stylized facts [14] used to be the bar that any new model needed to reach. These universal properties remain relevant and any Market Generator violating these can be deemed flawed. However, passing this requirement, without further ones, does not necessarily guarantee a good Market Generator in this setting of increased flexibility.
QQ plot and common summary statistics: though a sufficiently flexible generative model with sufficient training data can be expected to successfully approximate any target (marginal) distribution, a good performance at a finite number of (discretely observed) marginal distributions may have been very informative in classical settings, but for multiple types of generative models it would still leave the model unspecified between observation steps. Similar ideas hold for other typical summary statistics.
Usefulness of the synthetic data for the target application: Generated data should be as useful as the real data when used for the downstream task.
For the use case of data privacy protection: We refer for the corresponding tools from differential privacy and recent adaptations to [2].
If the use case is to enrich the training dataset for a neural network (e.g. performing trading [8] or hedging [6]) some consider the ‘train on synthetic test on real’ score:28 For example, in the case of deep hedging this would translate (as the name suggests) to re-training the hedging network on synthetic data and evaluating the profit and loss distribution of the hedging engine on real data, as a measure of success for the Market Generator’s output. This means of evaluation is potentially expensive as depending on the application retraining and re-running of the model is computationally expensive (and potentially risky). Therefore, efficiently computable proxy metrics for the training data whose similarity can be translated to similar hedging strategies (hence similar PnL performance on real data) are desirable.
For finite-dimensional distributions,29 the most commonly used30 distributional metrics are Wasserstein distances: e.g. ‘Earthmover distance’; Information theoretic divergences: e.g. Kullback-Leibler Divergence, or Maximum Mean Discrepancy (MMD)-type metrics, see [27]. If the target distribution is not explicitly known one can
either approximate distributions from observed samples and then work with the similarity metrics for inferred distributions,
or directly work on the level of samples with two sample tests to verify if the generated and the training samples originate from the same distribution (\(H_0\)) or not (\(H_A\)) without having to explicitly specify what that common distribution is at any point (Kolmogorov-Smirnov two sample test, or MMD-two sample test). The power of these tests depends on the sample size and the probability of a type 2 error is large if the sample size is small. See [8] and Chap. 5 of this volume for more on this.
Scoring rules assign a numerical score based on the predictive distribution and on the event or value that materializes [26]. Similarly to the idea of log-likelihood type evaluation methods, which evaluate the likelihood of real world observations using a density estimated from the generated data, scoring rules [26] can be used in settings where single observations in the real data are scored against an estimated density.
3.2 The Signature-Kernel-Backed Discriminator and Performance Evaluator for Market Generators
Signature (kernel) methods are appealing for data streams as they provide an elegant framework to consider distributions on path space. Denote by \(\mathcal {X}\) the space of continuous paths of bounded variation from \([0,T]\) to \(\mathbb {R}^{d_x}\). For any random variable X with values on \(\mathcal {X}\), let \(\mathbb {P}_{X} := \mathbb {P}\circ X^{-1}\) denote its law. Consider a target distribution \(\mathbb {P}_{X^{\text{true}}}(\cdot )\) on path space. The goal is to train a generative model such that the generated law \(\mathbb {P}_{X^{\theta }}\) is as close as possible \(\mathbb {P}_{X^{\text{true}}}\), for some appropriate notion of “closeness”.
In the following examples, this notion of closeness will be derived from relevant properties of the expected signature of a stochastic process. The expected signature (see Section 6 in [42]) of a process \(X_t\) is the natural generalisation of the moments of a distribution (generalised to the process \(X_t\)) and the exepcted log signature\(\text{log} S(X)\) provides the natural generalisation of the cumulants of a distribution (to \(X_t\)). Hence, under appropriate assumptions31 these quantities can be used to identify (characterize) the law of the stochastic process in question (See [42], p. 40).
First Generation of Signature-Based Market Generator Models, Based on Truncated Signature Kernels
The two following examples32 use the truncated Signature Maximum Mean Discrepancy (Sig-MMD) metric to evaluate the similarity of generated (synthetic) paths versus target (real) paths, which is closely related to the Wasserstein-1 metric on the relevant path space as indicated in [48].
> Example: Generating the Signatures of Log-Returns with VAEs (Using Truncated sig-MMD)
In [5], signatures were used in combination with VAEs to generate financial time series when the available training data is scarce. The methodology contains 5 main steps:
Step 1
Data extraction from the time series: Given a sample path from a data stream, subdivide the full time series into equal length intervals of (i) 1 day, (ii) 5 days (a business week), (iii) 20 days (a month)
Step 2
Preprocessing the data: To obtain training data from the resulting path segments, calculate log-returns \(r(t,\Delta ) := \log (S_{t+\Delta }/S_t)\) for \(\Delta \in \{1,5,20\}\). Then, convert the obtained data samples into truncated log signatures, applying the lead-lag transformation. This will enable a path-wise generation process.
Step 3
Creating and training the VAE network: After splitting the historical data into training/testing/validation sets, train a CVAE (conditioning on relevant market conditions) on the log signatures.
Step 4
Postprocessing of the outputs of the VAE: At this stage, one can either convert the generated log signatures back into paths, or use them directly. In order to invert signatures into paths, [5] propose a possible method to do so. In [44] a method to directly price and hedge derivatives using the signature of a path was proposed.
Step 5
Performance evaluation: Using signature-based similarity metrics, one can evaluate the performance of the generative process.
A crucial insight is at Step 5, using the signature viewpoint, where the performance evaluation is done via the properties of the expected signature.
In [47], the authors propose to use the truncated (Conditional) Sig-Wasserstein Distance (i.e. the Signature MMD) as objective for a Market Generator. They propose different types of time series generators, a neural SDE, and a so-called LogSig-RNN. The LogSig-RNN can be seen as a higher-order Taylor approximation of SDEs by combining Recurrent Neural Networks (RNNs) with the log signature: Consider a \(\mathbb {R}^e\)-valued time-homogeneous SDE
where \(f: x\times (s,w) \mapsto \mu (x)s + \sigma (x)w\) is a vector field. Locally approximating X via the step-M Taylor expansion yields for \(s<t\):
where \(f^{\circ m}: \mathbb {R}^e \to L\left ((\mathbb {R}^{d+1})^{\otimes m}, \mathbb {R}^e \right )\) is defined inductively via \(f^{\circ 1} = f\), \(f^{\circ m+1} = D(f^{\circ m})f\).33 Pasting these local step-M Taylor approximations
together, yields a numerical approximation scheme: Consider time partitions for the Taylor expansion and X, \(\Pi _h = (u_j)_{j=0}^{N_1}\) and \(\Pi _X = (t_k)_{k=0}^{N_2}\), with \(\Pi _h \subseteq \Pi _X\). Define \(\hat {X}\) on \(\Pi _X\) inductively. Let \(\hat {X}_0 = x_0\). For \(t_k\), find j with \(t_k \in (u_{j-1}, u_j]\) and define
by using the step-M Taylor expansion of \(X_{t_k}\) around \(u_{j-1}\). Approximating \(\tilde {F}\) leads to a generalized Logsig-RNN model [41]. This shows the use of signatures in the Market Generator, and in the discriminator used to train the parameters of the generator.
In [48], a conditional version of this is proposed.
The SignatureMaximum Mean Discrepancy metric is formed by combining the Maximum Mean Discrepancy (MMD) metric [27]34 with the signature kernel [10].35 To approximate the signature kernel,36 the previous two examples use the truncated signature. While in [48] the Sig-MMD is back-propagated in the objective, in [5] a Sig-MMD informed two sample test is performed37 to determine whether the simulated data and the training data have been sampled from the same (\(H_0\)) underlying distribution (stochastic process) or from different ones (\(H_A\)).
Truncated signature kernels do not take into consideration the entire (infinite) signature vector, they are truncated at a given (finite) level, and benefit from the property that the error between the full and truncated versions decays factorially in the level of truncation (see Lemma 2.3.1 in [7]).
The disadvantage of these first-generation type of (truncated signature kernel) models is that computing signature kernels by taking inner products of truncated signatures relies on some means of computing iterated integrals (to compute elements of the signature vector). This can introduce computational bottlenecks, especially in financial settings, where the target distributions are high-dimensional. Another disadvantage of these first-generation models there is no universal consensus about the ideal truncation level, which can be highly dependent on the situation at hand.
The second generation of Market Generators with signature kernel bypass the need to compute truncated signatures or any entries of the signature vector explicitly: In [51] the authors provided a kernel trick proving that the signature kernel satisfies
which reduces to a linear hyperbolic PDE when x and y are almost everywhere differentiable.38 Not only does this remove the necessity to justify the level of truncation which had been a frequent question in first generation models, but it also alleviates the computational bottleneck associated with the computation of high number of iterated integrals.39 The second generation models scale better to high dimensional scenarios, see [31]. There is a more sophisticated version of the second generation models where the kernel trick is used in conjunction with higher rank signatures see [38, 52] and [57] as well as the Chap. “Adapted Topologies and Higher-Rank Signatures” of this volume.
The Third Generation of Market Generators with Signature Kernel
In situations where we expect to see a very low number of observations (single observations) which are to be evaluated against a backdrop of distribution, scoring rules have an advantage over MMD-type distances, see [27]. The signature kernel score \(\phi _{\text{sig}}: \mathcal {P}(\mathcal {X})\times \mathcal {X}\to \mathbb {R}\) is defined through
It can be shown, that for any compact \(\mathcal {K}\subset \mathcal {X}\), \(\phi _{\text{sig}}\) is a strictly proper kernel score relative to \(\mathcal {P}(\mathcal {K})\), i.e. \(\mathbb {E}_{y\sim \mathbb {Q}}[\phi _{\text{sig}}(\mathbb {Q},y)] \le \mathbb {E}_{y\sim \mathbb {Q}}[\phi _{\text{sig}}(\mathbb {P},y)]\) for all \(\mathbb {P},\mathbb {Q} \in \mathcal {P}(\mathcal {K})\), with equality if and only if \(\mathbb {P} = \mathbb {Q}\). Let \(\mathbb {P} \in \mathcal {P}(\mathcal {X})\) and \(y\in \mathcal {X}\). Given m sample paths \(\{x^i\}_{i=1}^m \sim \mathbb {P}\), then
is a consistent and unbiased estimator of \(\phi _{\text{sig}}\). In such cases a divergence operator can be associated with the scoring rule. Note, that in this case the signature kernel score is closely related to the Sig-MMD, the latter can be rewritten as:
The associated kernel scoring rule benefits from better computational efficiency, while maintaining a close connection to the sig-kernel-MMD metric.
Another successfully used notion of distance between a point and a distribution in this context is Mahalanobis distance [42, 45].
> Example: Non-adversarial Training of Neural SDEs with Signature Kernel Scores
The innovation in [32], is Signature Kernel Scores are used, in order to train Neural SDEs.40 Usually, neural SDEs are trained adversarially as infinite-dimensional GANs, see [36]. The above metrics however provides all the tools needed to train the Neural SDE model using signature kernel scores. The training objective is given by
Note, that this is equivalent to minimizing \(\mathcal {D}_{k_{\text{sig}}}(\mathbb {P}_{X^{\theta }}, \mathbb {P}_{X^{\text{true}}})\). \(\mathcal {L}(\theta )\) is approximated by solving a system of linear PDEs depending on sample paths from the neural SDE. Training the model involves backpropagating through the SDE solver and the PDE solver. In [32] conditional Neural SDEs were applied to simulate FX pairs. It is shown, that the conditional generator exhibits the capability to produce conditional distributions that frequently encompass the observed path. Furthermore, these generated distributions capture certain distinctive characteristics of financial markets, such as martingality, mean reversion, or leverage effects when applicable.
3.3 A Good Discriminator Opens the Door to Further Associated Goals
It is not only Market Generation that suffers from the modeling challenges mentioned in previous sections. These challenges often propagate to other, related tasks as well. We have established in earlier sections that a centerpiece of a good Market Generator in financial contexts is the right performance evaluation metric, or discriminator. While it is rarely the case that this performance evaluation metric can be used off-the-shelf for other tasks, it can be the ideal starting point for other tasks (here referred to as ‘downstream tasks’) with the right amount of fine-tuning and care.
Distribution Regression
Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In [39] Lemercier et al. achieve a significant leap forward in this domain by establishing the distribution regression framework for data streams.
Anomaly detection focuses on determining whether a specific observation is significantly different from a reference set of observations considered normal. Change detection is a related idea without a reference to a corpus of normality.
> Example: Second-Generation sig-Kernel MMD Two-Sample Test to Indicate Distributional Changes in the Data [31]
Figure 2 shows a second-generation sig-kernel-MMD discriminator, as a two sample test across time, which is fine-tuned and enhanced for the purpose of identification of distributional (regime-) changes in sequential data, see [31, 32]. The grey line shows the value of the test statistic, where a higher value implicates a larger Sig-MMD distance in the samples, which indicates that the two samples are less likely to have been sampled from the same distribution (\(H_0\)). The time series is marked red if this test statistic surpasses a given threshold, indicating a regime change.
Fig. 2
Regime detection via the sig-kernel-MMD discriminator
> Example: Downstream Application of the Discriminator ofVolGAN[60]
The trained discriminator of the VolGAN might be used for the downstream task of detecting extreme market events.
Figure 3 contains discriminator scores on the training and testing data: Most notably, the data from start of the Covid-19 pandemic is highlighted as unusual.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Appendix: Recent Developments and Challenges in Generative AI
Generative AI and foundation models hold enormous potential to create new capabilities and value for enterprise. However, it also can introduce new risks, be they legal, financial, reputational or societal. In this section we provide some further insights into these aspects.
On the Use of Foundation Models in Finance (by Raeid Saqur)
Recent Developments in the Use of Foundation Models in Finance
The application of foundation models, specifically Pre-trained Language Models (PLMs) and Large Language Models (LLMs), in finance has evolved considerably over recent years. The journey began with the adaptation of foundational natural language processing models like BERT to the financial domain. FinBERT was one of the first attempts to bring PLMs into finance by pre-training BERT with large-scale financial texts. BloombergGPT marked a significant milestone in this evolution by introducing the first financial LLM with 50 billion parameters, pre-trained using both general and financial datasets [65]. Although BloombergGPT remains closed-source, its development demonstrated the effectiveness of scaling LLMs by using domain-specific training data. Meta AI’s LLaMA series, specifically LLaMA-13B, emerged as the first open-source LLM to attract considerable attention in financial research, opening up opportunities for extensive experimentation and domain-specific fine-tuning [58].
Foundation models in finance were first used for analyzing financial news or sentiment. They have since begun to assume roles, eg. in forecasting market trends. TradingGPT, a novel multi-agent system, leverages LLMs with a layered memory architecture [40] enabling trading agents to make more nuanced financial decisions by considering historical context, market dynamics, and individual risk profiles. More recently, from 2023 onward, the synergistic merging of deep reinforcement learning (RL) methods with large language model (LLM) alignment has gained momentum, exemplified by InstructGPT [50], also known as ChatGPT. In the realm of explainable financial forecasting, the work by Xinli-Yu et al. [66] investigates the capabilities of LLMs in explainable financial time series forecasting. Utilizing NASDAQ-100 stock data, company metadata, and economic/financial news, this study employs GPT-4 and Open LLaMA models for forecasting, demonstrating that LLMs can outperform traditional models like ARMA-GARCH in both accuracy and explanatory power.
There is no question that the future of foundation models in finance holds plenty of promise as researchers and practitioners refine these models, focusing on the nuanced needs of financial markets while leveraging advances in reinforcement learning, retrieval augmentation, and domain adaptation. The implementation of foundation models in finance continues to expand, with an increasing emphasis on integrating structured financial data and incorporating real-time multi-modal data for comprehensive analysis.
General Challenges with Generative AI
Environmental Impact and Global AI Divide
Infrastructure requirements: Building a foundation model from scratch is expensive, requires enormous resources and has a large environmental footprint.
Access to large and expensive quantities of computing power has become a prerequisite for developing advanced general-purpose AI. This has led to a growing dominance of large technology companies in general-purpose AI development.
The AI R&D divide often overlaps with existing global socioeconomic disparities, potentially exacerbating them.
Exacerbation of Societal Biases Through Malrepresentation in Training Datasets
Models can pick up undesirable characteristics of the data they were trained on. To avoid this, developers should carefully filter training data and encode specific norms into their models. For Market Generators a malrepresentation in the training data of the true target distribution can lead to flawed forecasts, modeling errors and arbitrage. In more general-purpose AI systems this leads to ingrained inequalities in race, gender etc.
Malicious Use of Generative Models Through Fake Content aka “deepfakes”
On an individual scale this can involve ‘phishing’ attacks or impersonation without consent. On a societal scale, realistic fake content can be used in “political disinformation campaigns” or combined with ‘microtargeting’ for conditioning content generation to match personal preferences.
Risks from Malfunctions Due to Inadequate Use
Product functionality issues occur when there is confusion or misinformation about what a general-purpose AI model or system is capable of. These addressing assumptions is also central for risk management in financial modeling.
As conditions change, systems (algorithms) may not be robust enough to generalize to the new environment and the outputs may no longer be valid.
Note, that these should not be confused with Linear Autoregressive models (e.g. AR(1)) which are used to model time series with a continuous state space.
This has previously appeared in the context of other AI applications like Image Generation too, for example when creating an image of Mona Lisa, but conditioned on the requirement that it should appear in Pop Art style.
Recall that is X denotes the PnL of a portfolio at a certain horizon, under probability density \(\mu \), Value-at-Risk and Expected Shortfall at level \(\alpha \in (0,1)\) are defined by
Recall that a statistical functional \(T:\mathcal {F}\mapsto \mathbb {R}\) defined on a set of distributions \(\mathcal {F}\) on \(\mathbb {R}^d\) is elicitable if there is a score function \(S(x, y)\) such that
for any \(\mu \in \mathcal {F}\). The elicitability of a functional T implies that it might be obtained through machine learning by using the loss \(S(x,y)\).
The key to this was to scale parameters and data proportionally: Double the size of your model, then double the size of your training data for optimal performance.
For the related task of time series forecasting with point estimates, though the output is typically single-element-valued, the computation of this output nevertheless requires information about the underlying distribution of the data generating process, which needs to be inferred from observations. If this distribution is difficult to infer, the same challenges apply as explained in the MG context.
For the related task of time series forecasting (with point estimates) though the output is single-element-valued, the computation of this output nevertheless requires the underlying distribution of the data generating process to be inferred from observations.
Signature (kernel) methods provide a means to establish such metrics beyond the finite-dimensional case to path-valued settings, as the following sections show.
is a symmetric positive definite function. It can be shown, that the signature kernel is characteristic for every compact \(\mathcal {K} \subset \mathcal {X}\), i.e. \(\mathcal {P}(\mathcal {K}) \ni \mathbb {P} \mapsto \int k_{\text{sig}}(x,\cdot )\mathbb {P}(dx) \in \mathcal {H}\) is injective. In particular, this implies that the signature kernel is cc-universal, i.e. for every compact \(\mathcal {K} \subset \mathcal {X}\), the linear span of the set of path functionals \(\{k_{\text{sig}}(x,\cdot ): x\in \mathcal {K} \}\) is dense in \(C(\mathcal {K})\) in the topology of uniform convergence.
Here, we denote by \(\mathcal {H}\) the unique reproducing kernel Hilbert space (RKHS) of \(k_{\text{sig}}\) and endow \(\mathcal {X}\) with a topology with respect to which the signature is continuous and denote by \(\mathcal {P}(\mathcal {X})\) the set of Borel probability measures on \(\mathcal {X}\), see [32] for full details.
It is not possible to give a universally valid number to answer at what level of truncation bottlenecks would start causing bottlenecks, since the computational complexity depends on an interplay of multiple factors: The length of the path segments, the special dimension of the setting, the number of the paths considered and the level of truncation. See Chap. “Signature Maximum Mean Discrepancy Two-Sample Statistical Tests” for the interplay of the first 3 factors.
Under appropriate regularity assumptions, there is a strong solution \(Y:[0,T]\to \mathbb {R}^{d_y}\) of this Stratanovich SDE.
1.
I.P. Arribas, C. Salvi, L. Szpruch, Sig-SDEs model for quantitative finance, in Proceedings of the First ACM International Conference on AI in Finance (2020), pp. 1–8
2.
T. Balch, V.K. Potluru, D. Paramanand, M. Veloso, Six levels of privacy: a framework for financial synthetic data. Preprint. arXiv:2403.14724 (2024)
3.
A. Borji, Pros and cons of GAN evaluation measures. Comput. Vis. Image Underst. 179, 41–65 (2019)CrossRef
4.
H. Buehler, L. Gonon, J. Teichmann, B. Wood, Deep hedging. Quant. Financ. 19(8), 1271–1291 (2019)MathSciNetCrossRef
5.
H. Buehler, B. Horvath, T. Lyons, I.P. Arribas, B. Wood, A data-driven market simulator for small data environments. Preprint. arXiv:2006.14498 (2020)
6.
H. Buehler, P. Murray, M.S. Pakkanen, B. Wood, Deep hedging: learning to remove the drift under trading frictions with minimal equivalent near-martingale measures. Preprint. arXiv:2111.07844 (2021)
7.
T. Cass, S. Crisopher, Lecture notes on rough paths and applications to machine learning. Preprint. arxiv.2404.06583 (2024)
8.
A.R. Cetingoz, C.-A. Lehalle, Synthetic data for portfolios: a throw of the dice will never abolish chance. Preprint. arXiv:2501.03993 (2025)
9.
J. Chen, J.C. Hull, Z. Poulos, H. Rasul, A. Veneris, Y. Wu, A variational autoencoder approach to conditional generation of possible future volatility surfaces. Available at SSRN (2023). https://ssrn.com/abstract=4628457 or /link?doi=10.2139/ssrn.4628457
10.
I. Chevyrev, H. Oberhauser, Signature moments to characterize laws of stochastic processes. J. Mach. Learn. Res. 23(176), 1–42 (2022)MathSciNet
11.
V. Choudhary, S. Jaimungal, M. Bergeron, FuNVol: multi-asset implied volatility market simulator using functional principal components and neural SDEs. Quant. Financ. 24(8), 1077–1103 (2024). https://doi.org/10.1080/14697688.2024.2396977MathSciNetCrossRef
12.
N.M. Cirone, A. Orvieto, B. Walker, C. Salvi, T. Lyons, Theoretical foundations of deep selective state-space models. Preprint. arXiv:2402.19047 (2024)
13.
S.N. Cohen, C. Reisinger, S. Wang, Arbitrage-free neural-SDE market models. Appl. Math. Financ. 30(1), 1–46 (2023)MathSciNetCrossRef
14.
R. Cont, Empirical properties of asset returns: stylized facts and statistical issues. Quant. Financ. 1(2), 223 (2001)
15.
R. Cont, J. da Fonseca, Dynamics of implied volatility surfaces. Quant. Financ. 2(1), 45–60 (2002)MathSciNetCrossRef
16.
R. Cont, M. Vuletic, Simulation of arbitrage-free implied volatility surfaces. Appl. Math. Financ. 30, 94–121 (2023)MathSciNetCrossRef
17.
R. Cont, M. Cucuringu, R. Xu, C. Zhang, Tail-GAN: nonparametric scenario generation for tail risk estimation. Preprint. arXiv:2203.01664 (2022)
18.
C. Cuchiero, G. Gazzani, S. Svaluto-Ferro, Signature-based models: theory and calibration. SIAM J. Financ. Math. 14(3), 910–957 (2023)MathSciNetCrossRef
19.
C. Cuchiero, S. Svaluto-Ferro, J. Teichmann, Signature SDEs from an affine and polynomial perspective. Preprint. arXiv:2302.01362 (2023)
20.
C. Cuchiero, E. Flonner, K. Kurt, Robust financial calibration: a bayesian approach for neural sdes. Preprint. arXiv:2409.06551 (2024)
21.
D. Dai, C. Deng, C. Zhao, R. Xu, H. Gao, D. Chen, J. Li, W. Zeng, X. Yu, Y. Wu, et al. DeepSeekMoE: towards ultimate expert specialization in mixture-of-experts language models. Preprint. arXiv:2401.06066 (2024)
22.
M.H. Davis, D.G. Hobson, The range of traded option prices. Math. Financ. 17(1), 1–14 (2007)MathSciNetCrossRef
23.
T. Fissler, J.F. Ziegel, et al. Higher order elicitability and osband’s principle. Ann. Stat. 44(4), 1680–1707 (2016)MathSciNetCrossRef
24.
W. Fu, A. Hirsa, J. Osterrieder, Simulating financial time series using attention. Preprint. arXiv:2207.00493 (2022)
25.
S. Gerhold, I.C. Gülüm, Consistency of option prices under bid–ask spreads. Math. Financ. 30(2), 377–402 (2020)MathSciNetCrossRef
26.
T. Gneiting, Making and evaluating point forecasts. J. Am. Stat. Assoc. 106(494), 746–762 (2011)MathSciNetCrossRef
27.
A. Gretton, K.M. Borgwardt, M.J. Rasch, B. Schölkopf, A. Smola, A kernel two-sample test. J. Mach. Learn. Res. 13(1), 723–773 (2012)MathSciNet
28.
A. Gu, K. Goel, C. Ré, Efficiently modeling long sequences with structured state spaces. Preprint. arXiv:2111.00396 (2021)
29.
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRef
30.
J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D.d.L. Casas, L.A. Hendricks, J. Welbl, A. Clark, et al., Training compute-optimal large language models. Preprint. arXiv:2203.15556 (2022)
31.
Z. Issa, B. Horvath, Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures. Preprint. arXiv:2306.15835 (2023)
32.
Z. Issa, B. Horvath, M. Lemercier, C. Salvi, Non-adversarial training of Neural SDEs with signature kernel scores. Published: 21 Sept 2023, Last Modified: 22 Jan 2024 NeurIPS 2023 poster
33.
N. Jackson, E. Suli, S. Howison, Computation of deterministic volatility surfaces. J. Computational Financ. 2(2), 5–32 (1999)CrossRef
34.
P. Kidger, P. Bonnier, I. Perez Arribas, C. Salvi, T. Lyons, in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No.: 279, 3105–3115 (2019)
35.
P. Kidger, J. Morrill, J. Foster, T. Lyons, Neural controlled differential equations for irregular time series. Adv. Neural Inf. Proces. Syst. 33, 6696–6707 (2020)
36.
P. Kidger, J. Foster, X. Li, T.J. Lyons, Neural SDEs as infinite-dimensional GANs, in International Conference on Machine Learning. PMLR (2021), pp. 5453–5463
37.
A. Koochali, P. Schichtel, A. Dengel, S. Ahmed, Probabilistic forecasting of sensory data with generative adversarial networks–ForGAN. IEEE Access 7, 63868–63880 (2019)CrossRef
38.
M. Lemercier, T. Lyons, A high order solver for signature kernels. Preprint. arXiv:2404.02926 (2024)
39.
M. Lemercier, C. Salvi, T. Damoulas, E. Bonilla, T. Lyons, Distribution regression for sequential data, in International Conference on Artificial Intelligence and Statistics. PMLR (2021), pp. 3754–3762
40.
Y. Li, Y. Yu, H. Li, Z. Chen, K. Khashanah, TradingGPT: multi-agent system with layered memory and distinct characters for enhanced financial trading performance. Papers 2309.03736, arXiv.org (2023)
41.
S. Liao, T. Lyons, W. Yang, H. Ni, Learning stochastic differential equations using RNN with log signature features. Preprint. arXiv:1908.08286 (2019)
42.
T. Lyons, A.D. McLeod, Signature methods in machine learning. Preprint. arXiv:2206.14674 (2024)
43.
T. Lyons, H. Ni, H. Oberhauser, A feature set for streams and an application to high-frequency financial tick data, in Proceedings of the 2014 International Conference on Big Data Science and Computing (2014), pp. 1–8
44.
T. Lyons, S. Nejad, I. Perez Arribas, Non-parametric pricing and hedging of exotic derivatives. Appl. Math. Financ. 27(6), 457–494 (2020)MathSciNetCrossRef
45.
P.C. Mahalanobis, On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)
46.
F. Moreno-Pino, A. Arroyo, H. Waldon, X. Dong, Á. Cartea, Rough transformers for continuous and efficient time-series modelling. Preprint. arXiv:2403.10288 (2024)
47.
H. Ni, L. Szpruch, M. Sabate-Vidales, B. Xiao, M. Wiese, S. Liao, Sig-Wasserstein GANs for time series generation, in Proceedings of the Second ACM International Conference on AI in Finance (2021), pp. 1–8
48.
S. Liao, H. Ni, M. Sabate-Vidales, L. Szpruch, M. Wiese, B. Xiao, Sig-Wasserstein GANs for conditional time series generation. Math. Financ. 34(2), 622–670 (2024). (Special Issue on Machine Learning in Finance)
49.
B. Ning, S. Jaimungal, X. Zhang, M. Bergeron, Arbitrage-free implied volatility surface generation with variational autoencoders. SIAM J. Financ. Math. 14(4), 1004–1027 (2023)MathSciNetCrossRef
50.
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C.L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., Training language models to follow instructions with human feedback (2022). Preprint. arXiv:2203.02155 (2022)
51.
C. Salvi, T. Cass, J. Foster, T. Lyons, W. Yang, The signature kernel is the solution of a Goursat PDE. SIAM J. Math. Data Sci. 3(3), 873–899 (2021)MathSciNetCrossRef
52.
C. Salvi, M. Lemercier, C. Liu, B. Horvath, T. Damoulas, T. Lyons, Higher order kernel mean embeddings to capture filtrations of stochastic processes. Adv. Neural Inf. Proces. Syst. 34, 16635–16647 (2021)
53.
C. Salvi, M. Lemercier, A. Gerasimovics, Neural stochastic PDEs: resolution-invariant learning of continuous spatiotemporal dynamics. Adv. Neural Inf. Proces. Syst. 35, 1333–1344 (2022)
54.
B.H. Sana, R. Cont, Recovering volatility from option prices by evolutionary optimization. J. Comput. Financ. 8(4), 43–76 (2005)CrossRef
55.
R. Saqur, A. Kratsios, F. Krach, Y. Limmer, J.-J. Tian, J. Willes, B. Horvath, F. Rudzicz, Filtered not mixed: stochastic filtering-based online gating for mixture of large language models. Preprint. arXiv:2406.02969 (2024)
56.
C. Schwarz, Interpretable genai: Synthetic financial time series generation with probabilistic LSTM. Available at SSRN 4877007 (2024)
57.
J. Tao, H. Ni, C. Liu, High rank path development: an approach of learning the filtration of stochastic processes. Preprint. arXiv:2405.14913 (2024)
58.
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al., LLaMA: open and efficient foundation language models. Preprint. arXiv:2302.13971 (2023)
59.
R.S. Tsay, Analysis of Financial Time Series (John Wiley & Sons, Hoboken, 2005)CrossRef
S. Weber, Distribution-invariant risk measures, information, and dynamic consistency. Math. Financ. Int. J. Math. Stat. Financ. Econ. 16(2), 419–441 (2006)MathSciNet
63.
M. Wiese, L. Bai, B. Wood, H. Buehler, Deep hedging: learning to simulate equity option markets. Preprint. arXiv:1911.01700 (2019)
64.
M. Wiese, B. Wood, A. Pachoud, R. Korn, H. Buehler, P. Murray, L. Bai, Multi-asset spot and option market simulation. Preprint. arXiv:2112.06823 (2021)
65.
S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann, P. Kambadur, D. Rosenberg, G. Mann, BloombergGPT: a large language model for finance. Preprint. arXiv:2303.17564 (2023)
66.
X. Yu, Z. Chen, Y. Ling, S. Dong, Z. Liu, Y. Lu, Temporal data meets LLM – explainable financial time series forecasting. Papers 2306.11025, arXiv.org (2023)