Chapter 5 – Building an Optimized Portfolio with R

Applying the mean-variance optimization approach (Markowitz’s Modern Portfolio Theory) to your portfolio can be challenging.  Often, people turn to Excel to set up the covariance table and run Solver to find the maximum Sharpe ratio (see the series here for instructions on using Excel).  However, Excel is a poor choice as it takes a long time to download the data, setup the tables, and properly configure Solver.  Fortunately, the R platform is a great tool for the job–and it is free.

In this article, we will show how you can get R configured quickly.  Using the code below, you will be able to generate an optimal portfolio and map the efficient frontier in a matter of minutes–as compared with the lengthy (and error-prone) process using Excel.

This is by far a better, more powerful, and quicker approach.  There is no excuse for not using the code below to test your own portfolio to identify potential opportunities for increasing your expected return while minimizing your portfolio’s variance (risk).

You can access the code for all instructions below here.

Introduction

A quick look at an efficient frontier reminds the reader that this is a parabolic curve and therefore finding the optimal portfolio is a task that involves finding the solution that minimizes a quadratic equation.

Since our objective is to provide an easy-to-use-guide to building an optimized portfolio, we will avoid delving too deep into the mathmatics involved in solving quadratic formulas.  However, we will try to introduce basic concepts so that the reader can experiment with the tools–and has a base of knowledge on which to evaluate the results.

The steps involved in using R to build an optimized portfolio using the Markowitz Modern Portfolio Theory are outlined below.

  1. Install R and the necessary packages
  2. Download the return data
  3. Solve for the Efficient Frontier
  4. Map the efficient frontier

1. Install R and the necessary packages

If you have not already downloaded and installed R, now is the time to try it out.  R is free and available for all major operating systems.  Installing it takes less than 5 minutes in most cases and you can copy and paste the code from this article (available here) to get up and running quickly.  

Begin by downloading the software for your operating system (Windows, MacOS, and Linux) from the Comprehensive R Archive Network site.

One of the most helpful aspects of R is that contributors have written many add-on packages, which extend R’s functionality and facilitates analysis by reducing the amount of code that needs to be written–especially for routine tasks.  After you have downloaded and installed R, you will need to install several additional packages.  These packages are written by contributors and you’ll find thousands of such add-ons to tackle just about every type of analytical problem you can imagine.  For this article, you’ll need to download the following:

  • stockPortfolio – we use this to download stock data
  • quadprog – we use this to solve for the optimal portfolio

After you have downloaded the packages, you will need to load them into your workspace (where R stores your environment) by issuing the following commands.


library(stockPortfolio)
library(quadprog)

That is it–you are ready to begin.

2. Download the return data

Although you can write your own R code to fetch and format stock returns, it is much easier to load and use the stockPortfolio package.

Begin by creating a list of the stocks you wish to include in your analysis.  For this article, we have chosen to use the same six ETFs that represent a cross-section of equity and debt securities (same as we used in the Excel based series).  Issue the following to assign the securities to the stocks object.

stocks <- c(
 "SPY",
 "EFA",
 "IWM",
 "VWO",
 "LQD",
 "HYG")

Loading the return data in R using the stockPortfolio package is extremely easy and can be done with one step.  (This can involve a series of steps in Excel–and each step is an opportunity to introduce errors.)

returns <- getReturns(names(stocks), freq="week")

You can see a summary of the returns by issuing the following command.  (We also suggest peeking at the structure of this object using the str() command.)

Summary of Returns

Watch the short video below for a demonstration on how to retrieve return data using R.

3. Solve for the Efficient Frontier

Unlike Excel, creating the efficient frontier is fast in R.  After the initial code (demonstrated below) is set up, it takes only seconds to add securities, impose constraints, and compare data sets.  Setting up the tables in Excel is time-consuming and cumbersome–and impractical for a portfolio with more than a few securities.

Additionally, as we’ll demonstrate in another article, R allows for much greater control and sophistication in calculating covariance.

We are going to use a function (solve.QP) from the quadprog package we installed and loaded earlier.  In order to effectively use this function (which can seem complex and a little overwhelming to those not steeped in quadratic theory), we need to understand the various arguments.

  • Dmat:  The covariance table (matrix).  We will calculate this based on the data frame of returns.
  • dvec:  This is a vector of the average returns for each security.  To find the minimium portfolio variance, we set these all to zero.  To find points along the efficient frontier, we use a for loop to allow these returns to vary.
  • Amat:  This is the matrix of constraints.  This can be a bit complicated–hang in there as we explain a little behind creating a matrix that imposes constraints.  For those not steeped in algebraic matrix math, it may be easiest to learn by examing the examples.  In the example, we use Amat to impose three distinct constraints: the portfolio weights must sum to 1, whether we allow any weights to be negative (implying short-selling and leverage), and whether there is a limit to any individual weight (to avoid high concentrations in just one security).
  • bvec:  Think of this as the legend to the Amat.  This is a vector of values that is matched up against the Amat matrix to enforce our constraints.  Again, if you are not familiar with matrix math, it may be easiest to learn by looking at the example and playing around with the code.
  • meq: This simply tells the solve.QP function which columns in the Amat matrix to treat as equality constraints.  In our example, we only have one (the weights must sum to 1).  The others are all inequality constraints (for example, weight must be > 0).  We simply assign 1 to meq.
  • factorized:  We do not use this argument, you can ignore it.

We have created the function below that takes a data frame of security returns and several arguments and returns a data frame containing points along the efficient frontier.  You can simply cut and paste this code into your R console.  Or you can access the full code here.


eff.frontier <- function (returns, short="no", max.allocation=NULL, risk.premium.up=.5, risk.increment=.005){
# return argument should be a m x n matrix with one column per security
# short argument is whether short-selling is allowed; default is no (short selling prohibited)
# max.allocation is the maximum % allowed for any one security (reduces concentration)
# risk.premium.up is the upper limit of the risk premium modeled (see for loop below)
# risk.increment is the increment (by) value used in the for loop

covariance <- cov(returns)
print(covariance)
n <- ncol(covariance)

# Create initial Amat and bvec assuming only equality constraint (short-selling is allowed, no allocation constraints)
Amat <- matrix (1, nrow=n)
bvec <- 1
meq <- 1

# Then modify the Amat and bvec if short-selling is prohibited
if(short=="no"){
Amat <- cbind(1, diag(n))
bvec <- c(bvec, rep(0, n))
}

# And modify Amat and bvec if a max allocation (concentration) is specified
if(!is.null(max.allocation)){
if(max.allocation > 1 | max.allocation <0){
stop("max.allocation must be greater than 0 and less than 1")
}
if(max.allocation * n < 1){
stop("Need to set max.allocation higher; not enough assets to add to 1")
}
Amat <- cbind(Amat, -diag(n))
bvec <- c(bvec, rep(-max.allocation, n))
}

# Calculate the number of loops based on how high to vary the risk premium and by what increment
loops <- risk.premium.up / risk.increment + 1
loop <- 1

# Initialize a matrix to contain allocation and statistics
# This is not necessary, but speeds up processing and uses less memory
eff <- matrix(nrow=loops, ncol=n+3)
# Now I need to give the matrix column names
colnames(eff) <- c(colnames(returns), "Std.Dev", "Exp.Return", "sharpe")

# Loop through the quadratic program solver
for (i in seq(from=0, to=risk.premium.up, by=risk.increment)){
dvec <- colMeans(returns) * i # This moves the solution up along the efficient frontier
sol <- solve.QP(covariance, dvec=dvec, Amat=Amat, bvec=bvec, meq=meq)
eff[loop,"Std.Dev"] <- sqrt(sum(sol$solution *colSums((covariance * sol$solution))))
eff[loop,"Exp.Return"] <- as.numeric(sol$solution %*% colMeans(returns))
eff[loop,"sharpe"] <- eff[loop,"Exp.Return"] / eff[loop,"Std.Dev"]
eff[loop,1:n] <- sol$solution
loop <- loop+1
}

return(as.data.frame(eff))
}

The hard work is done!  After you have this code loaded into R you can map the efficient frontier.

4. Map the Efficient Frontier

First, we run the function we just created against our returns and based on our portfolio constraints.  For this example, we will assume that short-selling is allowed and that we impose a maximum allocation of 45% for any one security.

eff <- eff.frontier(returns=returns$R, short="yes", max.allocation=.45, risk.premium.up=.5, risk.increment=.001)

Take a quick peek at what the function returned by using the head() function.

Head eff

Notice that the first six columns are the weights (or allocation percentages) of each security.  The last three columns are statistics about that particular portfolio.  We’ll use standard deviation and expected return columns to create the efficient frontier.

Use the following code to create the efficient frontier.


eff.optimal.point <- eff[eff$sharpe==max(eff$sharpe),]

# Color Scheme
 ealred  <- "#7D110C"
 ealtan  <- "#CDC4B6"
 eallighttan <- "#F7F6F0"
 ealdark  <- "#423C30"
ggplot(eff, aes(x=Std.Dev, y=Exp.Return)) + geom_point(alpha=.1, color=ealdark) +
 geom_point(data=eff.optimal.point, aes(x=Std.Dev, y=Exp.Return, label=sharpe), color=ealred, size=5) +
 annotate(geom="text", x=eff.optimal.point$Std.Dev, y=eff.optimal.point$Exp.Return,
 label=paste("Risk: ", round(eff.optimal.point$Std.Dev*100, digits=3),"\nReturn: ",
 round(eff.optimal.point$Exp.Return*100, digits=4),"%\nSharpe: ",
 round(eff.optimal.point$sharpe*100, digits=2), "%", sep=""), hjust=0, vjust=1.2) +
 ggtitle("Efficient Frontier\nand Optimal Portfolio") + labs(x="Risk (standard deviation of portfolio variance)", y="Return") +
 theme(panel.background=element_rect(fill=eallighttan), text=element_text(color=ealdark),
 plot.title=element_text(size=24, color=ealred))

After copying the preceding code into your R console, you will generate the following chart.

Efficient Frontier

Please watch the following video for a demonstration of using this code to quickly plot the efficient frontier.

If you appreciated this article, please take a few seconds to ‘like’ us below.  Thank you!