Machine Learning in Risk Management

Machine Learning in Risk Management

Machine Learning for Financial Institutions Terms like big data, machine learning and data science are used in many fields of business. In recent years, also financial institutions have shown increasing interest in these subjects. The expectations are that the machine learning can assist with things such as compliance, credit underwriting, client communication and risk analysis. […]

Machine Learning for Financial Institutions

Terms like big data, machine learning and data science are used in many fields of business. In recent years, also financial institutions have shown increasing interest in these subjects. The expectations are that the machine learning can assist with things such as compliance, credit underwriting, client communication and risk analysis. We already wrote some articles about these topics with the focus on vendors and competitors in this market. This article provides a general introduction into machine learning, following with a discussion on some applications within financial institutions.

An Introduction to Machine Learning

Living in the age of algorithms, machine learning is used in so many areas of our daily lives, and yet it seems an opaque term to describe. So what is machine learning? First of all, recall that every algorithm has an input and an output. The data goes into the computer, the algorithm processes it in some way, and returns the calculated output. Machine learning turns this around: in goes the data and the desired result, and the output is the algorithm that turns one into the other.

These algorithms are called learning algorithms (or learners), which themselves make other algorithms by learning fromthe data. Hence, the more data is available, the more they can learn. With ‘big data’, there is a lot to learn, which is why machine learning is becoming more and more popular.

Its first successful application was in finance by predicting stock ups and downs in the late 1980s, followed by mining corporate databases, and in areas like direct marketing, customer relationship management, credit scoring, and fraud detection.

Then, the applications for web search and e-commerce emerged, alongside the increasing use of large-scale modelling in various scientificfields. Followed by theattacks of 9/11, applications for fighting against terrorism were also developed.

Finally, with the appearance of the ‘big data’ term, machine learning has become an important aspect of the global economy and its future. Machine learning has strong connections with statistics.

In general, you can distinguish three types of learning: supervised learning, unsupervised learning and reinforcement learning.

Supervised learning is used for predicting/estimating by means of a model, trained with an dependent variable and a set of explanatory independent variables. This model can be a simple linear regression, but there are a lot more possibilities.

Unsupervised learning is used to analyse data without a dependent variable in the model, as for example is done with principle component analysis.

Reinforcement learning is used to train a computer indecision making. The computer trains itself by repeatedly playing a game, rewarding itself each time in case of winning, and punishing itself in case of losing. In this way, a computer can develop its own decision strategy for the game.

The Trade-off in Learning

The goal of machine learning is to identify relationships/patterns in all types of data. The model-building process usually depends a lot on cross-validation, where models are trained and tested using many different subsamples to prevent overfitting. Overfitting occurs when a model learns the detail and noise in the data to the extent that it negatively impacts the performance of the model on new data.

We want a model to learn a good representation of the data, and at the same time being able to generalise well to new data. The first one boils down to minimising the bias (i.e. capturing all the relevant information within the data), and the second one to minimising the variance (i.e. avoid modelling the random noise in the data).

In practice, more complex and flexible models have a high variance with a small bias, while more simple and inflexible models have a low variance with a large bias. Therefore, a trade-off between complexity and performance must always be made are made or certain variables are used, as these rely on automated data-driven algorithms. This could limit the use in (regulatory) risk reporting, however, it is possible to apply some of the techniques with a pre-defined variable selection procedures, rather than just purely following the machine learning philosophy.

Common Algorithms

To give an idea of howsome of the machine learning algorithms work, we will shortly explain three well-known algorithms, namely: decision trees, support vector machines and neural networks.

Decision trees learn by means of inverse deduction, which means figuring out what knowledge is missing to make the deduction go through, and then making itas general as possible. It is a bit similar to the game of 20 questions, where a series of carefully formulated questions are being asked about some (values of the) variables of the observations, until a conclusion about the dependent variable is reached.

These series of questions and their possible answers can be composed in a decision tree, presented like a hierarchical structure constructed by nodes and directed edges, as shown in figure 1.


The support vector machines (SVM) areused to classify the data, which essentially means separating the data in classes (e.g. default versus no default). The SV Malgorithm searches for a hyperplane toseparate the data. In two dimensions the hyperplane is the average of the two parallel lines. The distance between the two lines is maximized such that the data is separated, as shown in figure2.


The algorithm can be extended for data that is not perfectly separable by means of a hyperplane. Neural Networks can be seen assimplified models of brains. In a brain, there are neurons, which are either activated or deactivated, and synapses, which connect the neurons together. The neurons are represented as simple functions, and the synapses are represented by generally small numbers between -1 and 1. The total ‘weight’ of all the synapses connected to a neuron determines the neurons state. The neurons are arranged in layers: an input layer, one or more ‘hidden’ layers, and an output layer.

You put the representation of what you are trying to learn into the input layer. Those activated neurons affect the weights on the neurons in the next layer. Those weights affect how the neurons in that layer are activated, which affects the weights on the next layer, and so on. Eventually you get to the output layer, where the network makes a prediction. If that prediction is right, great! If not, you basically run through the network in reverse, tweaking the weights until it starts making correct predictions.

Applications of Machine Learning in Risk Management

The remaining part of this article discusses some applicationsof machine learning in risk management in more detail. The subjects consideredare credit risk, behavioural modelling, liquidity risk, compliance and some other applications.

Credit Risk Management

For credit risk modelling, the first steps for using big data analytics are already taken. Credit risk management can be split up in two: (1) the decision making during the underwriting process with regards to granting a loan and determining the interest rate, and (2) managing the loan portfolio.

In the online retail market, the underwriting process is extremely important, as every obstacle in the ordering process can lead to a client stopping his purchase. Hence, within a few seconds an online payment provider must decide whether the client is credit worthy enough to buy a product on a credit. For banks, a lot of possible data sources are already within the institution.

The biggest challenge is to combine structured and unstructured data, which is already available at a bank. The data can be provided from client history and payment transactions, as well as from website behaviour or cookie data. To enrich the data, financial institutions might buy data from external sources, for example creditcard providers. This data can then be used to model the provision on the portfolio, but also for early warning systems. There are many applications with regards to early warning systems and credit scoring.

Behavioural Modelling

One of the main applications with regards to behavioural modelling is the usage of machine learning within the insurance companies. Insurance companies use machine learning for understanding risk, claims, customer experience, and monitoring fraud.

Nowadays, insurers can obtain data from many different sources, building a more complete picture of their customers. For example, it is now possible to learn from audits of closed claims, because leakage becomes controllable by the insurer.

Machine learning algorithms can learn from claim audits by using enhanced scoring and process methods throughout the claims lifecycle. In addition, it can be used for automatic validation of the insurance policies, providing a better fraud detection. Within the banks, machine learning can be used to model customer behaviour.

Research is already done in modelling prepayment risk. The better the customer behaviour is modelled, the better the funding is matched with the cash flows. The same holds for modelling saving accounts or withdrawals from lines of credit.

Liquidity Risk Management

Another application of machine learning is to better understand liquidity risk. Consider, for example, asset management companies that use machine learning for a better estimation of the cost of liquidating fund positions in case of redemptions.

By incorporating internal trade data into the existing market liquidity models, along with these techniques, it is possible to take into account variables such as time to liquidation, transaction cost and volume.

Additionally, forecasting fundflows with machine learning can enable the integration of a high number of factors into the model, such as fund returns, high-yield sector flows and total sector returns.

Compliance and Regulatory Reporting

Using machine learning for particular regulatory and compliance challenges could improveefficiency and profitability within financial institutions. One of such applications is the data aggregation and management for assessment of compliance with regulations, as for example, required for capital and liquidity reporting and stress testing.


More and more regulators ask for data at a higher frequency and of a greater granularity, resulting in a more data-driven financial supervision. To meet these demands, the data needs to be of high quality (i.e. structured, well defined, accurate and complete). Through its ability of identifying complex and nonlinear patterns in large data sets, machine learning can be used for organising and analysing large quantities of (structured and unstructured) data.

Another application is found in modelling, scenario analysis and forecasting. For example, better risk models can be obtained by adjusting algorithms based on newly obtained information, thereby improving their accuracy through use. Considering stress testing and risk management, this could improve definition of models, the accuracy of statistical analyses and the calculation and simulation of stress scenarios.

Other Applications

Big data can also be artificially created with Monte Carlo simulations. Bigger Monte Carlo simulations may perform a more comprehensive ALM studies or lead to better pricing of derivatives. The only problem with a more extensive Monte Carlo study is the running time. Together with the big data/ machine learning evolution, the use of the Graphics Processing Unit (GPU) has become more and more popular. Thousands of cores are available for parallel computingin a GPU, meaning more comprehensive Monte Carlo simulations without an increase in running time.

Furthermore, machine learning canhelp to understand the effect of a firm’s own trading on market prices(‘market impact’). To start with, it can be combined with traditional market impact models. For example, firms can use machine learning to gain more information from historical data or to identify nonlinear relationships in order flow. A bolder application is creating trading robots which learn themselves how to react on market changes. Both applications are already in use.


The applications of machine learning within financial risk management are not limited by the examples mentioned above. The question how we can assist financial institutions with implementations of (some) of the machine learning applications in risk management.

Note that, implementing models based on machine learning not only includes the training of the algorithms. In most machine learning implementations, the data gathering, integration and pre-processing usually take more time than the actual training of the algorithm.

It is an iterative process of training a model, evaluating the results, modifying and repeating, rather than just a single process of data preparation and training.

At the moment, we already have some experience in credit risk modelling. This gained experience and knowledge isuseful to extend to other applications regarding machine learning. In this way, we could assist financial institutions to transform data into valuable information for their financial risk management.

Contact us

Please contact Jeroen van der Heide to find out more about Machine Learning.