A.I. Investing:
Machine Learning Predictions for Individual Stocks and Financial Assets
A Robust Investment Performance Architecture
This machine learning prediction system is the result of five generations of collaboration with Cornell’s Masters of Applied Statistics program to achieve a commercial outcome. We are now in our seventh year of working with Cornell, and we have worked on machine learning predictions for many years, with many failed attempts along the way. AI Investing
We have had our achievements as well, and we have outlined the schematic of our A.I. forecasts below, taking into account many of our successes while avoiding the errors found along the way. We will avoid using technical jargon because our intended audience is sophisticated investors.
The prediction system is designed using a hierarchical composition of various models. This approach is similar to the prediction scheme discussed by futurist and author Ray Kurzweil in his book “How to Build a Mind,” which focuses on constructing general intelligence. Kurzweil draws strong parallels between successful machine learning (or artificial intelligence) and the hierarchical functioning of the human brain, which gives rise to the neocortex and its capacity for abstract and creative thought.
A similar analogy can be found in weather forecasting. Weather predictions rely on tens of thousands of barometric, wind, humidity, and temperature readings from around the globe. Combined with centuries of recorded weather data, meteorologists can now forecast the weather more accurately than ever before. For example, forecasting the weather in Poughkeepsie differs significantly from forecasting in San Antonio, TX, and predictions for July 25th are distinct from those for December 25th.
Stock market predictions operate similarly. To create a globally effective prediction model, it is essential to build it from successful smaller models.
While stock market predictions may never achieve the same level of accuracy as weather forecasting, it is important to acknowledge our limitations. Remember Benjamin Martin’s advice in the movie “The Patriot”: “Aim small, miss small.”
If you expect machine learning forecasts to make you rich quickly, you will likely be disappointed. However, if your goal is to consistently add a few points of return to a prudent investment strategy, you are in the right place.
The first thing we did was decide that each stock should have its own prediction model. This means that the data used to predict each stock is uniquely trained to the specific dynamics of that stock. The prediction for every stock is a composite of three separate prediction systems. A prediction for the stock market, a prediction for the sector, and a prediction model for the stock’s alpha: Alpha is the return that a stock generates, independent of the stock market’s influence. The logic follows from straightforward investment thinking. Building on the Nobel Prize-winning work of William Sharpe and the Capital Asset Pricing Model (CAPM): Each stock prediction follows a basic regression equation:
Where:
- = Expected Return of Investment
- = Expected Stock Market Return
- = Risk-free rate of return (yes, Risk-free is a fallacy, but it’s not material so bear with me)
- = Beta of the investment
Assuming that risk-free assets offer no effective real return (after inflation), we can simplify the equation to: This is called a single-factor model. We are using a two-factor model with both the return on the stock market and the return on the sector. Asset performance is strongly explained by these three factors, so predicting them well can yield a tremendous advantage. Accordingly, our process takes the form of:
In this model, each stock has its own predicted machine learning model, Alpha. Then it shares the predictions. for the sectors and market using its own unique observed betas. That alpha is computed primarily from information in the company’s financial reports. We use about 70 data points from the financial statements and then expand those 70 data elements across ratios and multiple time scales to create about 300 data features for every stock. The machine learning algorithms find the most predictive combinations of these features and discard the data features with no explanatory power. As an update to a simple market forecast, we are predicting the return from both the stocks, sector beta, and the stock market beta, as the two combined offer much more explanatory power. These are powered by a cadre of economic data and performance data
With our extensive background in portfolio optimization, generating expected returns for investment candidates is a critical input. For many years, our various software platforms, including Advisors Portfolio Think Tank and Gravity Investments, have built portfolios where expected returns are produced for any data based on a multi-sampling expert system. This means we use historical performance to predict the future by optimally selecting multiple historical periods previously observed to offer a predictive advantage. This technique has served investors well and has been shown to offer a predictive advantage, as measured by an in-sample, out-of-sample correlation of approximately 0.25%. More than half of this advantage comes from the momentum signal. Momentum is regularly observed to be the factor offering the greatest return and fits well within a diversified portfolio strategy. For a detailed chart on factor performance, see Visual Capitalist.
Our success with these quantitative forecasting methods has set a high bar that any machine learning prediction must surpass to serve customers better than our existing systems.
Additionally, having built one of the best portfolio backtesting engines in the industry, we have been diligent about addressing biases and implementing proper control processes to ensure the utmost integrity of results.
These two traits have served as both barriers and stepping stones, enabling Portfolio Think Tank to take a long view in designing our stock market prediction system.
Machine Learning Stocks Prediction Results and Analysis
The chart below shows the annual return predictions of the meta-model for 1,926 U.S. stock exchange-listed stocks with sufficient history and data. These 1,926 stocks are a subset of the Russell 3000 Index, for which we had adequate financial statement history during our research in spring 2022.
Our tests span the period from 2014 to 2021, comparing each stock’s predicted return to its actual return for the same period. Each stock’s quarterly predictions were correlated with its actual returns, providing one data point for the histogram below. A correlation of zero indicates random predictions, while a correlation of one indicates perfect accuracy.
We take extensive measures to mitigate bias. This includes walk-forward out-of-sample validation, separating models into training, testing, and validation sets, thoroughly reviewing feature data, carefully governing feature selection, and prioritizing bias-minimizing objective functions.
For further interpretation, any value with a negative correlation indicates a poor prediction. While 10% of our predictions fall into this category, only 1.5% have a correlation less than -0.25, classifying them as very poor predictions. Therefore, our directional accuracy stands at 90%.
The challenge in conducting such a test lies not in generating attractive results but in ensuring that the returns produced are reliable. Eliminating biases is difficult, which is a common issue in most machine learning projects.
Below is a year-by-year examination of the test results, followed by an aggregate analysis.
What I appreciate about this method of performance evaluation is its clarity in investment terms. Do the actual returns grow steadily? To create these graphs, we classified all predictions into eight octiles. The rightmost octile (bar) represents the top 1/8th of the 1,926 stocks with the highest predicted returns, while the leftmost octile represents the lowest predicted returns. If the actual returns follow this pattern, we know we are adding predictive value.
As you can see, the results are not perfect (never trust perfect AI-based stock predictions! ), but they are sufficient to achieve a realistic and consistent performance advantage.
We tested several variations around the time horizon. In these charts, Time Horizon 4 represents four quarters, or a one-year prediction. Our results were consistent across multiple time horizons, which we have configured as an input variable for generating the predictions. “Window size” refers to the amount of historical data used. A window size of 0 means we use all available data preceding the prediction date. We observed that using more data generally improves results, which is common in machine learning applications.
Year by year Decile performance
you can scroll the years to see each year's predicted vs actual performance
Aggregate performance 2016-2021
For this period, the S&P 500 produced an annual return of 14.75%, and the S&P 1500 produced an annual return of 14.50%.
Portfolios selected from our top decile over the same period would average a return of 28%.
Accordingly, one could judge the performance of the model by buying the top decile and shorting the bottom decile. This long-short, market-neutral portfolio strategy would yield nearly an 8% return.
Results of Market and Sector Models
Scroll through the charts to explore the performance of the sector models. For the sectors, we used a regression of the best-fit sector, irrespective of SIC or S&P classification.
The Future
We believe that the combination of stock-specific machine learning model formation within the CAPM prediction architecture provides a real opportunity for the predictions to help deliver better performance for investors across assets, economic conditions, and time.
AUTHOR
James Damschroder
ACKNOWLEDGEMENTS
Special thanks to Marissa Rubb for her steady work on the project, to Professor Diciccio as an academic advisor, Xiaolong Yang, Ph.D., Sr. Lecturer, Sr. Associate Director, MPS Program, and all of the students who have made contributions over the years.
Important Disclosure about A.I., Backtesting and Hypothetical Performance
This portfolio or research is hypothetical. This is a historical simulation of the portfolio performance an investor would have obtained had they invested in the same selections at the beginning of the simulation. This report provides information on how the portfolio holdings would have changed and performed over a certain period.
We have strived to reduce or eliminate potential biases in the process to provide the most accurate assessment of the performance prospects of the strategy. However, it may not be possible for any historical simulation to completely ensure it is free of all biases.
For a more complete understanding of biases and risks when backtesting portfolio strategies, please see “The Gold Standard for Portfolio Backtesting” and “The Seven Deadly Sins of Portfolio Backtesting.“
Backtested strategies also run the risk of cherry-picking, which occurs when the author of the backtest creates many variations and presents the most favorable one. This research was not produced in whole or in part by cherry-picking.
This simulation is based on an account with tax-exempt or tax-deferred growth. Taxable accounts will have to pay the appropriate taxes for dividends, interest, and capital gains, which will decrease the performance depicted.
This simulation is not based on actual trading accounts or account composites, which may or may not exist for this strategy and may be materially different, including worse than the performance illustrated above. Past performance is not necessarily indicative of future performance. Performance results, including risk, return, and diversification measures, are not guaranteed to persist in the future.
This historical performance simulation has been adjusted to reflect estimated management fees.
The suitability of this portfolio strategy requires that you have thoughtfully and accurately completed your investor objectives from your accounts’ Investment Policy Statement. Diversification strategies alone cannot assure a successful investment outcome. Strategies offering greater diversification cannot guarantee any reduction in the loss of capital.
Your ability to follow this investment strategy is a risk. Investors often dispose of successful strategies at inopportune times, thus turning potentially profitable strategies into losses.
Portfolio data is taken from sources believed to be accurate; however, there is no warranty or guarantee as to the accuracy or completeness of the data and statistical calculations thereupon. Our performance results are not audited or otherwise approved by any regulatory agency. We regularly perform quality and accuracy tests on our calculations and algorithmic procedures. Portfolio ThinkTank does not furnish investment advice without an investment advisory agreement.
The period selected for analysis may significantly impact the relative attractiveness of the strategy versus another portfolio or benchmark. The author of the strategy controls the default period used to analyze performance, and users may select any desired period from the menu. In general, longer periods, greater diversification, and lower concentrations of holdings result in more credible and persistent performance.