Stock Market Analysis
Dane Van Domelen
vandomed@gmail.com
2020-04-26
You can install and load stocks from GitHub via the following code:
devtools::install_github("vandomed/stocks")
library("stocks")
The stocks package has a variety of functions for analyzing
investments and investment strategies. I use it for a lot of my articles
on Seeking
Alpha.
The package relies heavily on Yahoo!
Finance for historical prices and on the
quantmod package for downloading that data into R.
There are functions for calculating performance metrics, visualizing the
performance of funds and multi-fund portfolios, and backtesting trading
strategies. The main functions are:
Function | Purpose |
---|---|
load_prices |
Download Historical Prices |
load_gains |
Download Historical Gains |
plot_growth |
Plot Investment Growth |
calc_metrics |
Calculate Performance Metrics |
calc_metrics_overtime |
Calculate Performance Metrics over Time |
calc_metrics_2funds |
Calculate Performance Metrics for Two-Fund Portfolios |
calc_metrics_3funds |
Calculate Performance Metrics for Three-Fund Portfolios |
plot_metrics |
Plot One Performance Metric (Sorted Bar Plot) or One vs. Another (Scatterplot) |
plot_metrics_overtime |
Plot One Performance Metric vs. Time or One vs. Another over Time |
plot_metrics_2funds |
Plot One Performance Metric vs. Another for Two-Fund Portfolios |
plot_metrics_3funds |
Plot One Performance Metric vs. Another for Three-Fund Portfolios |
Stocks and bonds are obviously the primary building blocks for a
retirement portfolio, and I think the ETF’s SPY and TLT pair together
very nicely for a very effective two-fund strategy. Let’s look at the
performance of these funds separately and together.
We can use load_gains
to download historical daily gains for SPY and
TLT over their mutual lifetimes:
library("stocks")
gains <- load_gains(c("SPY", "TLT"), to = "2018-12-31")
head(gains)
#> Date SPY TLT
#> 2395 2002-07-31 0.00242 0.01239
#> 2396 2002-08-01 -0.02611 0.00569
#> 2397 2002-08-02 -0.02241 0.01024
#> 2398 2002-08-05 -0.03480 0.00441
#> 2399 2002-08-06 0.03366 -0.00855
#> 2400 2002-08-07 0.01744 0.00240
We can call (or pipe into) calc_metrics
to calculate some performance
metrics. calc_metrics
returns a normal data frame, but I’ll callknitr::kable
to print it as a neat-looking table:
metrics <- calc_metrics(gains)
knitr::kable(metrics)
Fund | CAGR (%) | Max drawdown (%) | Mean (%) | SD (%) | Sharpe ratio | Annualized alpha (%) | Beta | Correlation |
---|---|---|---|---|---|---|---|---|
SPY | 8.49 | 55.2 | 0.039 | 1.168 | 0.034 | 0.0 | 1.000 | 1.000 |
TLT | 6.31 | 26.6 | 0.028 | 0.844 | 0.033 | 10.4 | -0.292 | -0.404 |
We see here that SPY has achieved stronger growth (8.5% vs. 6.3%), but
with a much worse max drawdown (55.2% vs. 26.6%). TLT’s Sharpe ratio (a
measure of risk-adjusted returns) is somewhat higher than SPY’s.
Without getting too far ahead of myself, TLT’s positive alpha (0.039%)
and negative beta (-0.292) are precisely why it pairs so well with SPY.
This isn’t unique to TLT; all bond funds should generate alpha
(otherwise, don’t invest!), and they’re often negatively correlated
with equities.
For a visual comparison of the returns and volatility of these two
ETF’s, we can plot mean vs. SD using plot_metrics
.
plot_metrics(metrics, mean ~ sd)
No surprise, the S\&P 500 ETF had more growth, but also higher
volatility.
(Side note: You could achieve the same plot by specifying gains
rather
than metrics
, or by simply specifying the tickers
input.)
Negative correlation works wonders for a two-fund portfolio, so let’s
look at how consistently TLT achieves negative correlation with SPY,
using calc_metrics_overtime
and plot_metrics_overtime
. For
illustrative purposes, I’ll include the full 3-step process: load
historical gains, calculate the correlation over time, and generate the
plot.
c("SPY", "TLT") %>%
load_gains(to = "2018-12-31") %>%
calc_metrics_overtime("r") %>%
plot_metrics_overtime(r ~ .)
While the tendency is certainly for negative correlation, there’s a lot
of variability, and in some years the correlation was actually slightly
positive.
As you can see, the default behavior is to calculate the requested
metric on a per-year basis. You can also request per-month calculations
or rolling windows of a certain width (see ?calc_metrics_overtime
).
And the Pearson correlation is just one of many metrics you can plot
(see ?calc_metrics
for the full list).
Everyone loves piping these days, but for typical use cases I would
actually recommend skipping directly to plot_metrics_overtime
. If you
specify tickers
, it will download the data it needs on the fly. This
code is much shorter and produces the same figure as above:
plot_metrics_overtime(formula = beta ~ ., tickers = "TLT")
A 50% SPY, 50% TLT portfolio should generate much better risk-adjusted
returns than SPY (and perhaps TLT) itself, but a 50% bonds allocation is
pretty high so raw returns will probably be lower.
To look at this, we can add a column to gains
and then callcalc_metrics
, requesting a few particular metrics:
gains$`50-50` <- gains$SPY * 0.5 + gains$TLT * 0.5
calc_metrics(gains, c("cagr", "mdd", "sharpe", "sortino")) %>%
knitr::kable()
Fund | CAGR (%) | Max drawdown (%) | Sharpe ratio | Sortino ratio |
---|---|---|---|---|
SPY | 8.49 | 55.2 | 0.034 | 0.042 |
TLT | 6.31 | 26.6 | 0.033 | 0.050 |
50-50 | 8.37 | 23.0 | 0.059 | 0.082 |
Indeed, while the 50-50 portfolio achieved slightly lower raw returns
than SPY alone, its max drawdown was far better, and its Sharpe and
Sortino ratios indicated much better risk-adjusted growth compared to
the individual ETF’s.
That will likely depend on what metric you want to maximize. In terms of
raw growth, roughly 75% SPY is optimal, but the curve is pretty flat–the
CAGR is roughly the same from 60-100% SPY.
plot_metrics_2funds(gains = gains,
formula = cagr ~ allocation,
tickers = c("SPY", "TLT"),
from = "2010-01-01")
In terms of risk-adjusted growth, the Sharpe ratio curve is somewhat
more interesting. The maximum Sharpe ratio occurs around 40% SPY, and
the Sharpe ratio gets much worse as you approach 60% SPY and higher.
plot_metrics_2funds(gains = gains,
formula = sharpe ~ allocation,
tickers = c("SPY", "TLT"),
from = "2010-01-01")
We can gain additional insight by plotting two metrics against each
other, across all possible allocations. A common strategy is to plot the
mean vs. standard deviation as a function of the allocation:
plot_metrics_2funds(gains = gains,
formula = mean ~ sd,
tickers = c("SPY", "TLT"),
from = "2010-01-01")
This plot yields an interesting finding: starting at 100% TLT,
increasing the allocation to SPY simultaneously reduces volatility and
increases returns. In other words, you’d be crazy not to ride the
curve up and to the left, adding at least a 30% SPY allocation.
A big caveat is that this is all based on historical data. There’s no
guarantee that 30% SPY, 70% TLT will have lower volatility or greater
returns than TLT going forward.
I think three-fund portfolios are the sweetspot in terms of balancing
complexity and performance. With two funds, you’re relying on a single
source of alpha generation; with > 3 funds, it’s hard to visualize, and
thus hard to understand whether the constituent funds actually
complement each other.
I won’t go into full detail about it here, but three asset classes that
I think work really well together are large-cap stocks, long-term bonds,
and junk bonds. To visualize such a strategy, implemented via Vanguard
mutual funds:
plot_metrics_3funds(formula = mean ~ sd,
tickers = c("VFINX", "VBLTX", "VWEHX"),
from = "2010-01-01")
100% VFINX maximizes expected returns, but also volatility. If you
wanted to take on no more than one-half of the S\&P’s volatility, while
maximizing returns, you could add an allocation to VBLTX (move from 100%
VFINX to the left along the upper black curve). If you’re very
conservative and want to target something like 0.4% volatility, a VWEHX
allocation eventually becomes helpful (get off of black curves before it
veers downward and to the right).
Mean vs. SD is the standard way of visualizing portfolios, but Sharpe
ratio vs. SD is more useful for understanding how risk-adjusted
performance varies with allocation. If we plot Sharpe ratio vs. SD, the
benefit of adding exposure to bonds becomes more clear:
plot_metrics_3funds(formula = sharpe ~ sd,
tickers = c("VFINX", "VBLTX", "VWEHX"),
from = "2010-01-01")
Groovy! By the way, if you want to see individual data points on the
plot (i.e. what allocation each data point corresponds to) you can just
set plotly = TRUE
when you call plot_metrics_3funds
or any of the
other plotting functions.
You can find me on Twitter at
@DaneVanDomelen, and of course
feel free to make feature requests and collaborate directly on GitHub.
Version | Updates |
---|---|
1.0 | Original |
1.2-1.4 | Added functions, bug fixes, etc. |
2.0 | Switched to ggplot, added piping support, simplified functions for calculating metrics |