What could machine learning mean for the financial markets?
Machine learning brings a lot of new tools. That said, I would argue that finance has been implicitly using machine-learning principles since the beginning. Chicago Booth’s Eugene F. Fama and Dartmouth’s Kenneth R. French selected three variables out of thousands as important for explaining asset return variations. While they did that using economic insights, they were working in exactly the same spirit. What they effectively did was a form of variable selection that ML now streamlines. Machines are mimicking what humans were already doing.
The industry has long borrowed insights from academic work, including Harry Markowitz’s modern portfolio theory, the Black-Scholes formula used in options markets, and the Fama-French factor models. But on ML, six years ago, the industry—at least part of it—was leading academic researchers. I had the head of a major hedge fund tell me that he didn’t even read academic finance papers.
Yale’s Bryan T. Kelly and I decided to write a paper introducing ML to academic finance. How do you use hundreds of variables to make better predictions about returns? That is where we started. From there, we established reasons for why the methods should be adopted. The paper has garnered attention from academics and Wall Street, indicating a growing interest in this field. We recently coauthored a survey that summarizes the latest developments that have been made.
In our initial paper, we introduced advanced ML technologies, such as trees and neural nets, useful in predicting stock returns. Since then we have moved on, in other research, to alternative data, explaining the use of image recognition and natural language processing tools adopted from artificial intelligence. Alternative data include news feeds, which represent one of the largest databases in terms of text. With those, you have to use ML to help you unearth information embedded in the text, as language is a highly complex information-encoding scheme. You even need large language models to read between the lines. We do not use ChatGPT in our research—we started working on this in 2019, before ChatGPT was born—but we use the model behind it.
Today, the asset management industry is increasingly boasting about its ML prowess, attracting prominent people from the data-science or ML communities. I haven’t even mentioned China, where the quantitative industry has grown since 2019 to a size that is incredible. Quite a few quant funds there have reached ¥10 billion (almost US$1.5 billion) in assets under management.
But finance is different from computer science, and we need to be cautious about adopting tools that may not be applicable to markets. We are trying to demystify ML; we want to understand what its weaknesses and limitations are as well as how we can improve things. There’s a lack of theoretical guidance.
One concern is the black-box nature of ML models. If a fund loses money, its manager needs to be able to explain to investors what happened. If a trading strategy is based on 1,000 variables rather than three, it’s hard to figure out exactly what went wrong. But if you want better predictions, you might need to live with some drawbacks. There’s a trade-off between performance and interpretability.
Finance is a conservative field, and there are also questions about how much better ML is than simpler models. There are areas where it might not work well, such as for long-term predictions, where we might have to rely more on intuition from economists.
But with alternative data, there’s a lot that one can do. We first analyzed numbers, and then words. Now we’re also looking at context. Previous approaches cannot account for that, but large language models can. I’m happy to have been an early adopter of ML techniques, and I hope this thing will continue to thrive. In a data-rich environment, ML has the potential to far surpass what the best minds in economics could do without it.
Dacheng Xiu is professor of econometrics and statistics at Chicago Booth.