This blog article is the first of a series intended to document the inner workings of the Simba backtesting spreadsheet. The objective is two-fold:
- Provide a better understanding of how the spreadsheet works, to facilitate peer reviews and future maintenance.
- Describe various techniques that have been used to compute various investment metrics in a compact manner, which could be useful in other contexts (e.g. your own spreadsheet).
The intent of this introductory article is to elaborate on how the spreadsheet is constructed, and provide an overview of its layered structure.
The Simba backtesting spreadsheet
The Simba spreadsheet is described on this wiki page, which starts with the following paragraphs:
Simba’s backtesting spreadsheet describes a spreadsheet originally developed by forum member Simba for the purpose of acting as a reference for historical returns, and analyzing a portfolio based on such historical data. The spreadsheet is no longer maintained by Simba, but other forum members continue to support it and to expand functionality, as a Bogleheads community project.
Backtesting is a term used in oceanography, meteorology and the financial industry to refer to testing a predictive model using existing historic data. It is also often used to analyze the past for research purposes. It is notably useful to quantitatively assess the impact asset allocation had (in known history) on possible investment strategies.
The wiki page proceeds to provide a brief overview of the individual worksheets (spreadsheet tabs), and instructions on how to download the latest version of the spreadsheet itself.
This article assumes that the reader is fairly familiar with the use of the Simba spreadsheet, including the instructions on the README tab, and the portfolio analysis tools (e.g. charts and stats) available in the Analyze_Portfolio, Compare_Portfolios and Lazy_Portfolios tabs.
To follow this article in detail, you should open a separate window using version 17a of the Simba spreadsheet, which can be downloaded by clicking on this link. The cell references are of course subject to change in future updates, even if the general structure will probably stay very similar.
The Data_TR_USD tab
The general intent of this worksheet is to provide the historical returns of reference on an annual basis for a large number of individual asset classes, each embodied by a given real-life fund, and to compute various metrics on a per individual asset class basis.
This worksheet is fully autonomous and can be copied in a separate spreadsheet to provide reference data for other types of analysis and tools than what is available in the Simba spreadsheet.
The raw data starts from row 193, coming from various sources documented in the Data_Sources tab. The returns (total return, i.e. price change plus dividends) typically correspond to two types of years:
- years where the annual return from a real-life fund (e.g. from Vanguard or iShares) is known,
- years where a best guess is used to simulate what the equivalent real-life fund might have returned (e.g. using index data, or a synthetic model).
In the first case, data sources have taken into account the fund’s expense ratio (ER), and the return numbers reflect that. In the second case, this isn’t the case (e.g. index data do not factor in any expense ratio).
To make return numbers more consistent, rows 8 and below perform an adjustment, using the ER value from row 2 to adjust raw returns for the years where the real-life fund didn’t exist or didn’t provide a full annual return. Furthermore, there is a global switch in cell A2 allowing to make all returns of reference become inflation-adjusted (while they are nominal by default), and the corresponding logic is also implemented in the cells from row 8 and below. Note that the formula does not perform any arithmetic operation, it adjusts the raw numbers in a geometric manner.
For example, cell B8 includes “(B193–B$3)/(1+B$3%)” as part of the formula. It may not be obvious at the first glance, but this is strictly equivalent to 100*((1+B193%)/(1+B$3%)-1), which is a more natural way to geometrically subtract the expense ratio (B$3) from the raw return (B193). Do a little bit of algebra to convince yourself! This technique is actually used twice in the exact formula, once for the ER, once for the (optional) inflation adjustment.
If you scroll down the Data_TR_USD tab, you’ll find more formulas. Rows 165 and below provide CAGR, standard deviation and more for a few time periods. Scrolling further down to row 352 and below, large tables compute correlation and rolling returns metrics. And at the very bottom, you’ll find a mysterious ‘under the hood’ table. We’ll come back to it in a separate article (hint: this is for unbalanced portfolio computations, where we need each asset class to grow on its own).
This worksheet, and notably the outcome of the formulas in the upper rows (after the ER adjustment), is the basis for the rest of the computations performed in the other worksheets.
The Portfolio_Math tab
This is where most of the heavy lifting happens, crunching a lot of numbers for each portfolio being studied in the other tabs, in a very unified manner. Let’s take a look at some simple features to get an idea of the general approach.
Check cell M5 and the column below. This references the first portfolio defined in Analyze_Portfolio. In the same way, cell AB5 and below reference the first portfolio defined in Compare_Portfolios, cell AC5 and below the second portfolio from Compare_Portfolios, etc.
Now scroll down to cell M97. The intent is straightforward, compute the nominal return of the portfolio defined in cell M5/etc which occurred in 1871. The formula looks a bit complicated due to the need to transpose the data being used, but the core of the formula is simple, sum the returns of each component of the portfolio for that year, apportioned to the weight of the component in the portfolio.
Then scroll down to cell M254. The intent is to compute the real (inflation-adjusted) return of the portfolio, for the year 1871. This relies on the previous (nominal) computation, while (geometrically) subtracting the inflation for this year. To foresee the possibility of separate models for inflation and to further unify the math in this worksheet, the inflation column (G) is actually defined as a portfolio itself! Therefore cell G97 and below compute the (nominal!) inflation for each year, and the formula in cell M254 and below leverages this outcome.
Let’s keep scrolling down, cell M412 and below. The intent here is to compute the portfolio trajectory, based on an initial investment and the portfolio’s annual returns. The key parameters (initial investment, starting year, ending year) are imported from Analyze_Portfolio in cells M84 to M86. Based on that, and the portfolio’s annual nominal returns that we just discussed, it is straightforward to incrementally compute the portfolio’s nominal balance in cell M412 and below, year over year.
A similar logic applies to cell M570 and below, computing the portfolio’s balance in real (inflation-adjusted) terms.
This article being just an introduction, more details about Portfolio_Math will be provided in follow-up articles.
The Analyze_Portfolio tab
Let’s see how the computations performed in Portfolio_Math are leveraged in Analyze_Portfolio, through a couple of simple examples.
The first portfolio being analyzed is defined in column E, starting from cell E5. The initial investment and time period of interest are defined in cells E84 to E86. As we’ve seen, those cells are referenced in Portfolio_Math, and you may notice that care was taken to align the top rows of the two worksheets in the same manner.
Now let’s look at cells E95 and E96. Those are array formulas computing the aggregate (geometric) return of the portfolio based on its (nominal/real) returns during the time period of interest – which is isolated by the OFFSET() part of the formula. This is a simple example of how the base computations performed in Portfolio_Math become useful. Then look at cells F95 and F96, which compute the standard deviation of corresponding annual returns (error correction was added with IFERROR() because the STDEV() function isn’t robust enough and doesn’t deal well with some extreme cases where there is no deviation at all).
Slightly more complicated is the formula for correlation with US returns in cell J95. This requires the use of a benchmark (the US market). The formula refers to column E in Portfolio_Math, which works in a similar way as the Inflation column we previously discussed. The benchmark is defined as a (single asset) portfolio in the top rows of column E, and corresponding returns are computed below in a logic fully consistent with what we’ve seen. This enables in turn the PEARSON() formula in Analyze_Portfolio (cell J95) to do its job of computing the correlation metric. In the same way, cell K95 references a benchmark portfolio with International returns.
Finally, the same approach applies to charts. Scroll down to row 105 in Analyze_Portfolio. Check the data sources used to feed the Portfolio Balance chart, and you will quickly figure out that this comes straight from Portfolio_Math, i.e. the portfolio balance rows that we previously discussed.
The Compare_Portfolios tab
To finish our quick tour of the layered structure of the Simba spreadsheet, let’s move to the Compare_Portfolios worksheet, which allows comparing five portfolios side to side.
You can recognize the same type of portfolio definition structure in the top rows, down to row 90 or so.
The following rows (94 and below) are different from Analyze_Portfolio, being positioned to easily compare numerous metrics derived from the five portfolios’ respective performances. But the logic of the formulas is very similar, leveraging again on the base math performed in the Portfolio_Math tab. Browse a bit, check some of the simple formulas (e.g. portfolio CAGR, average, standard deviation), this should easily make sense by now.
There is a good deal of complicated logic in this worksheet, and we’ll come back to it in follow-up articles.
Note that the structure of the Lazy_Portfolios tab is very similar to the Compare_Portfolios tab, simply extending the set of portfolios to a larger number (25), and calculating the very exact same set of metrics. The charts used to illustrate this broader comparison are different though.
The general structure of the spreadsheet is made of three layers:
- base math at the asset class level (Data_TR_USD),
- base math at the portfolio level (Portfolio_Math),
- portfolio definition and investigation through metrics and charts (Analyze_Portfolio, Compare_Portfolios and Lazy_Portfolios)
Although the presentation of the results may differ, depending on the type of analysis and comparison, the underlying math has been unified as much as possible.
There are numerous subtleties related to one metric or another, one chart or another, but this layered structure is proving more flexible and much easier to maintain than earlier incarnations of the Simba spreadsheet.