How Data & Machine Learning-driven credit analysis can help navigate the COVID crisis

Holding over 15 years of experience in financial services industry, Praveen’s expertise lies in the areas of team building, credit analytics, pricing, strategic initiatives and corporate development, and client engagement.

The COVID-19 pandemic presents an era-defining challenge to not just the public health sector but to the resilience of the entire global economy. Lenders have a central role to play in supporting the economy during this crisis, and in facilitating a rapid and sustained recovery afterwards. Given these new realities, banks and other lending institutions have the burden of trying to know which businesses they need to support (either through government programs or additional lending off their own balance sheet) and where to responsibly cut their losses to protect themselves and their stakeholders.

A key challenge, however, is that when it comes to commercial lending, banks use risk models to help inform their decisions. These models have been built up internally over decades, but COVID-19 has presented a crisis where historical correlations do not hold. As Bill Demchak of PNC said in an article earlier this year: “We’re in this economy where everybody bases their models predicting the future on the past and of course we’ve never been in a situation where [we] effectively have been forced to shut down the economy with this much fiscal stimulus.”

His comments have been echoed by others in the industry - Jamie Dimon, CEO of JP Morgan, said: “This is such a dramatic change of events. There are no models that have done GDP down 40%, unemployment growing this rapidly, etc.” whilst Mark Mason, CFO of Citi, noted: “No stress scenario that's been created thus far would've contemplated the amount of fiscal response and monetary response that we've seen in short order, so that's not been modelled.”

The crisis has amplified the issues around solely relying on low-frequency, often delayed data. Backward-looking analysis and the models that use lagging data become stale incredibly quickly in a crisis environment. Consider a model that relies entirely on quarterly data that can lag 1-2 quarters. At the later part of the second quarter of 2020, some of the key indicators would still be showing data from late 2019 to early 2020. The impact of government fiscal stimulus, changes in consumer behaviour and the changing spread of the virus in different areas would all be missing in such data.

OakNorth’s ability to onboard data sources as they become available during the pandemic has been a key enabler for our modelling capabilities throughout the pandemic. Currently, we utilize thousands of timeseries from dozens of different data sources that were released in the last five months. The analysis we perform using this data ranges from understanding the changing mobility and foot traffic in a particular city, to monitoring the trends in the future appointment availability for dental clinics.

Backward-looking analysis and the models that use lagging data become stale incredibly quickly in a crisis environment

High-frequency data enables us to observe the recent business environment in real-time rather than waiting weeks or potentially months for data to become available. For example, we monitor a large number of transportation statistics such as number of travellers passing through security checkpoints and city level traffic congestion on a daily basis. Compare this to using a lagging indicator such as official transportation statistics that gets released monthly with several months delay. A delayed indicator for a transportation metric that has a cascading effect in a diverse range of sectors such as hospitality, insurance and manufacturing, would have a big impact on the accuracy and timeliness of the analysis.

High-frequency indicators are clustered into various categories to reduce noise, decrease sensitivity and increase accuracy. A lot of our recent work has gone into making sure that we can integrate newly available sources into our flow and validate their accuracy through back-testing in a shorter time frame. As the indicators start stabilizing and our confidence has increased on some of our core indicators as a result of continuous validation, adding or removing new indicators has become an almost automated process.

In addition to newly available data, we also take full advantage of the recent advances in data science. Increased availability of data science tooling, and faster computation have provided the flexibility necessary to handle the modelling challenges presented by the pandemic. By using machine-assisted analytical frameworks, we can process large volumes of data (both stock and flow) to identify patterns, that otherwise are outside the realms of what the human mind can manually interpret. Data patterns are recognised by our platform’s machine learning algorithms and analysed through hundreds of data science frameworks (as applicable to different sectors) to make the output formulaic. These frameworks are constantly evolving and developing to produce granular modelling assumptions for key financial metrics – revenue, operating costs, working capital and CAPEX – that can be applied seamlessly on borrower financials.

While we hope that extraordinary policy responses may bring about a likely bounce back in economic activity, such a scenario is not assured with significant risks to long-term prosperity continuing to occlude lenders’ decision-making and strategic visibility. Recovery, when it comes, will vary greatly in both speed and intensity across industries, geographies and business models with some effects lasting for many years. This is precisely where leveraging data and machine learning can help bring insights necessary for managing risks tightly, while at the same time helping lenders identify and seize the next wave of opportunities.