Skip to main content

2020 | Buch

Mining Data for Financial Applications

4th ECML PKDD Workshop, MIDAS 2019, Würzburg, Germany, September 16, 2019, Revised Selected Papers

herausgegeben von: Valerio Bitetta, Dr. Ilaria Bordino, Prof. Andrea Ferretti, Francesco Gullo, Stefano Pascolutti, Giovanni Ponti

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes revised selected papers from the 4th Workshop on Mining Data for Financial Applications, MIDAS 2019, held in conjunction with ECML PKDD 2019, in Würzburg, Germany, in September 2019.

The 8 full and 3 short papers presented in this volume were carefully reviewed and selected from 16 submissions. They deal with challenges, potentialities, and applications of leveraging data-mining tasks regarding problems in the financial domain.

Inhaltsverzeichnis

Frontmatter
MQLV: Optimal Policy of Money Management in Retail Banking with Q-Learning
Abstract
Reinforcement learning has become one of the best approach to train a computer game emulator capable of human level performance. In a reinforcement learning approach, an optimal value function is learned across a set of actions, or decisions, that leads to a set of states giving different rewards, with the objective to maximize the overall reward. A policy assigns to each state-action pairs an expected return. We call an optimal policy a policy for which the value function is optimal. QLBS, Q-Learner in the Black-Scholes(-Merton) Worlds, applies the reinforcement learning concepts, and noticeably, the popular Q-learning algorithm, to the financial stochastic model of Black, Scholes and Merton. It is, however, specifically optimized for the geometric Brownian motion and the vanilla options. Its range of application is, therefore, limited to vanilla option pricing within the financial markets. We propose MQLV, Modified Q-Learner for the Vasicek model, a new reinforcement learning approach that determines the optimal policy of money management based on the aggregated financial transactions of the clients. It unlocks new frontiers to establish personalized credit card limits or bank loan applications, targeting the retail banking industry. MQLV extends the simulation to mean reverting stochastic diffusion processes and it uses a digital function, a Heaviside step function expressed in its discrete form, to estimate the probability of a future event such as a payment default. In our experiments, we first show the similarities between a set of historical financial transactions and Vasicek generated transactions and, then, we underline the potential of MQLV on generated Monte Carlo simulations. Finally, MQLV is the first Q-learning Vasicek-based methodology addressing transparent decision making processes in retail banking.
Jeremy Charlier, Gaston Ormazabal, Radu State, Jean Hilger
Curriculum Learning in Deep Neural Networks for Financial Forecasting
Abstract
For any financial organization, computing accurate quarterly forecasts for various products is one of the most critical operations. As the granularity at which forecasts are needed increases, traditional statistical time series models may not scale well. We apply deep neural networks in the forecasting domain by experimenting with techniques from Natural Language Processing (Encoder-Decoder LSTMs) and Computer Vision (Dilated CNNs), as well as incorporating transfer learning. A novel contribution of this paper is the application of curriculum learning to neural network models built for time series forecasting. We illustrate the performance of our models using Microsoft’s revenue data corresponding to Enterprise, and Small, Medium & Corporate products, spanning approximately 60 regions across the globe for 8 different business segments, and totaling in the order of tens of billions of USD. We compare our models’ performance to the ensemble model (of traditional statistics and machine learning) currently being used by Microsoft Finance. Using this in-production model as a baseline, our experiments yield an approximately 30% improvement overall in accuracy on test data. We find that our curriculum learning LSTM-based model performs best, which shows that one can implement our proposed methods without overfitting on medium-sized data.
Allison Koenecke, Amita Gajewar
Representation Learning in Graphs for Credit Card Fraud Detection
Abstract
Representation learning in graphs has proven useful for many predictive tasks. In this paper we assess the feasibility of representation learning in a credit card fraud setting. Data analytics has been successful in predicting fraud in previous research. However, the research field has focused on techniques which require tedious and expensive hand-crafting of features. In addition, existing works often ignore information related to the network of transactions. Representation learning in graphs tackles both of these challenges. First, it provides the possibility to tap into the relational and structural aspects of the transaction network and leverage these in a predictive model. Second, it featurizes the graph without the need for manual feature engineering. This work contributes to the literature by being the first to explicitly and extensively show how fraud detection modeling can benefit from representation learning. We discern three different approaches in this paper: traditional network featurization, an inductive representation learning algorithm and a transductive representational learner. Through extensive experimental evaluation on a real-world dataset we show that state-of-the-art representation learning in graphs outperforms traditional graph featurization.
Rafaël Van Belle, Sandra Mitrović, Jochen De Weerdt
Firms Default Prediction with Machine Learning
Abstract
Academics and practitioners have studied over the years models for predicting firms bankruptcy, using statistical and machine-learning approaches. An earlier sign that a company has financial difficulties and may eventually bankrupt is going in default, which, loosely speaking means that the company has been having difficulties in repaying its loans towards the banking system. Firms default status is not technically a failure but is very relevant for bank lending policies and often anticipates the failure of the company. Our study uses, for the first time according to our knowledge, a very large database of granular credit data from the Italian Central Credit Register of Bank of Italy that contain information on all Italian companies’ past behavior towards the entire Italian banking system to predict their default using machine-learning techniques. Furthermore, we combine these data with other information regarding companies’ public balance sheet data. We find that ensemble techniques and random forest provide the best results, corroborating the findings of Barboza et al. (Expert Syst. Appl., 2017).
Tesi Aliaj, Aris Anagnostopoulos, Stefano Piersanti
Convolutional Neural Networks, Image Recognition and Financial Time Series Forecasting
Abstract
Convolutional Neural Networks (CNN) are best known as good image classifiers. This model is recently been used for financial forecasting. The purpose of this work is to show that by converting financial information into images and feeding these financial-image representation to the CNN, it results in an improvement in classification.
Argimiro Arratia, Eduardo Sepúlveda
Mining Business Relationships from Stocks and News
Abstract
In today’s modern society and global economy, decision making processes are increasingly supported by data. Especially in financial businesses it is essential to know about how the players in our global or national market are connected. In this work we compare different approaches for creating company relationship graphs. In our evaluation we see similarities in relationships extracted from Bloomberg and Reuters business news and correlations in historic stock market data.
Thomas Kellermeier, Tim Repke, Ralf Krestel
Mining Financial Risk Events from News and Assessing their Impact on Stocks
Abstract
The impact of financial risk events on stock market is a fairly established area of research in the financial domain. However, the analysts require these events to be represented in a structured form in order to carry out statistical analysis. In this work, we aim is to identify and extract various financial risk events from news articles along with associated organizations to facilitate integrated analysis with structured business data. We propose a two-phase risk extraction algorithm involving a CNN based semi-supervised risk event identification and gradient boosting based entity association algorithm to extract risk events from news and associate them to their target organizations. We have analyzed large volumes of past available data using Granger causality to assess the impact of these events on various stock indices. Further, the utility of extracted risk events in predicting stock movement has been shown using a Bi-LSTM network based prediction model. The proposed system outperforms state of the art linear SVM on data for different stock indices.
Saumya Bhadani, Ishan Verma, Lipika Dey
Monitoring the Business Cycle with Fine-Grained, Aspect-Based Sentiment Extraction from News
Abstract
We provide an overview on the development of a fine-grained, aspect-based sentiment analysis approach aimed at providing useful signals to improve forecasts of economic models and produce more accurate predictions. The approach is unsupervised since it relies on external lexical resources to associate a polarity score to a given term or concept. After providing an overview of the method under development, some preliminary findings are also given.
Luca Barbaglia, Sergio Consoli, Sebastiano Manzan
Multi-step Prediction of Financial Asset Return Volatility Using Parsimonious Autoregressive Sequential Model
Abstract
Previously, application of deep learning based sequential model drastically improved accuracy of volatility prediction in modelling of financial time series. However, unlike traditional financial time series model such as GARCH family of models, majority of deep learning based financial time series models focus solely on optimizing a single-step volatility prediction error and are not capable of conduct multi-step training and prediction of volatilities since volatility is the inherent uncertainty of the model prediction, whose multi-step prediction is drastically different from prediction of the mean of the financial time series.
In this work, a parsimonious autoregressive multi-step density regression (PA-MS-DR) framework is proposed to solve this problem. Our model framework can accurately capture the heavy-tail property of financial asset returns. In addition, our model is autoregressive, and it allows multi-step ahead training and forecasting, which significantly expands the applicability of the model in real world scenario. Finally, the structure of our method inspires us to devise a novel training method, which greatly accelerates the training speed of the algorithm.
The performance of PA-MS-DR is tested by comparing it with traditional time series models such as GARCH family of models and a non-autoregressive baseline model with similar structure. The result shows that our model consistently and significantly outperforms GARCH family of models. In addition, our model consistently outperforms the non-autoregressive baseline model, which demonstrates the effectiveness of our autoregressive model structure.
Xiangru Fan, Xiaoqian Wei, Di Wang, Wen Zhang, Wu Qi
Big Data Financial Sentiment Analysis in the European Bond Markets
Abstract
We exploit the novel Global Database of Events, Language and Tone (GDELT) to construct news-based financial sentiment measures capturing investor’s opinions for three European countries, Italy, Spain and France. We study whether deterioration in investor’s sentiment implies a rise in interest rates with respect to their German counterparts. Finally, we look at the link between agents’ sentiment and their portfolio exposure on the Italian, French and Spanish markets.
Luca Tiozzo Pezzoli, Sergio Consoli, Elisa Tosetti
A Brand Scoring System for Cryptocurrencies Based on Social Media Data
Abstract
In this work, we present an overview on the development and integration in ENEAGRID of some tools to evaluate brand importance of homogeneous financial instruments, such as cryptocurrencies. Our system is based on the analysis of textual data, such as tweets or online news. A collaborative environment called Web Crawling Virtual Laboratory allows data retrieval from the web. Below we describe this virtual lab and the ongoing activity aimed at adding a new feature, to allow news and social media crawling. We also provide some details about the integration in ENEAGRID of a new measure of brand importance and its Virtual Laboratory, namely the Semantic Brand Score. We aim to test the first version of this new virtual environment on Twitter data, to rank digital currencies.
Giuseppe Santomauro, Daniela Alderuccio, Fiorenzo Ambrosino, Andrea Fronzetti Colladon, Silvio Migliori
Backmatter
Metadaten
Titel
Mining Data for Financial Applications
herausgegeben von
Valerio Bitetta
Dr. Ilaria Bordino
Prof. Andrea Ferretti
Francesco Gullo
Stefano Pascolutti
Giovanni Ponti
Copyright-Jahr
2020
Electronic ISBN
978-3-030-37720-5
Print ISBN
978-3-030-37719-9
DOI
https://doi.org/10.1007/978-3-030-37720-5