Arithmetic, Harmonic, and Geometric Means with R

One of the most popular posts on EconomistAtLarge.com is an article explaining the differences between arithmetic, harmonic, and geometric means.   Here, we will build on that article by demonstrating differences between the various means (or averages) and providing R functions that you can easily and quickly use.

What do we mean by average

People casually refer to “averages” in everyday language.  Journalists rely on our understanding of the concept when they describe weather, stock prices, economic indicators, etc.  In fact, most people never put much thought into what they mean when they use the word–however, this often leads to mistakes.

When most people say the average of something is x, what they usually mean is that the arithmetic mean is x.  This is correct for a number of different problems.  The issue is that most people never think about whether that is the right average to use based on the characteristics of the observations.

The Issue with Outliers

Let’s illustrate a problem with the arithmetic averages by comparing the three averages to two sets of data.  In chart 1, we see a set of 25 observations with no outliers.  The three averages (harmonic, geometric, and arithmetic) are all fairly close (ranging between 4.54 and 4.71).  Since there are no outliers and we have no reason to believe that the observations are inter-dependent (for example, are the returns of a stock), it is safe to use the arithmetic mean as the average.

Mean - Normal Distribution

Chart 1

Now let’s consider what happens when there is a large outlier in the population.  We use the same observations as in chart 1; however, we’ve modified one observation to be significantly larger (by a factor of 2.5) than the others.  See how the arithmetic mean overstates the central tendency of the population?  If you look closely in chart 2, you’ll notice that the arithmetic mean is actually above all but 2 of the observations.  This implies that if you were using the average to forecast (for example, to forecast the next period’s sales), you would be overestimate (by a significant margin).

Mean - Outliers

Chart 2

Issue with Investment Returns

Another issue arises when evaluating investment returns.  A common mistake is to use the arithmetic average of the returns over time.  Consider the following investment profile.

Period 1 Period 2 Period 3 Period 4 Period 5
Return 20% 5% 10% -20% 5%

The mistake is to simply add these returns and divide by 5.  Using the arithmetic mean results in an average return of 4%.

However, this is wrong!

Let’s start by looking at what $100 would be worth based on these returns.

Period 1 Period 2 Period 3 Period 4 Period 5
Beginning Balance 100 120 126 138.6 110.9
Ending Balance 120 126 138.6 110.9 116.42

If you were to use the arithmetic mean (which many people mistakenly do), you would arrive at the wrong ending balance.  Consider the following formula based on the wrong average.

Incorrect Result based on Using Arithmetic Mean

First, let’s walk through the incorrect calculation.  Most people would begin by finding the simple (arithmetic) average of the returns using a formula like the one below.

(.2 + .05 + .10 – .0 + .05) / 5 = .04

 

Next, to calculate the value of the investment after five periods, one would plug the average into the following formula.

100 * (1 + .04)^5 = 121.67

 

Using the arithmetic mean, the investment would be worth $121.67 at the end of the five periods.

Correct Result using Geometric Mean

Using either R (the code is contained below) or Excel, the geometric mean is 3.09 (rounded).

Next, calculate the final balance of the investment using the geometric mean instead of the arithmetic.

100 * \left ( 1 + .0309 \right )^{5}= 116.42

You can compare the result ($116.42) with the table above and confirm that this is correct.  Using the arithmetic mean overstated the resulting investment by over 5%.

When to Use Harmonic or Geometric Instead of Arithmetic

Determining which mean to use is straightforward and based on the characteristics of your data.  The decision tree below can be a useful visual guide to determining which average you’ll use.  However, we recommend to start by visually inspecting your data.  Using a tool such as R (preferred) or Excel, load the data and create a bar or line graph.  Note whether there are trends or outliers and then follow the map below.

Mean Decision Tree

The following R code can be used to calculate the harmonic and geometric means of sample data, as well as generate the charts above. R is an easy-to-use and install platform that provides business professionals with a powerful and flexible tool for conducting business analytics. Since it is free (and simple to install), we highly recommend trying this tool out for yourself. You may just find that R is a better alternative than Excel for many tasks.

# Economist at Large
# www.economistatlarge.com
# Last edited May 9, 2013
# Differences between Arithmatic, Harmonic, and Geometric Means

library(ggplot2)
library(reshape2)
# Function to calculate the harmonic mean
harmonicMean <- function(array){
 if(!is.numeric(array)){
 stop("Passed argument must be an array. Consider using sapply for data frames.")
 }
 if(any(array<0)){
 stop("All values must be greater than zero.")
 }
 length (array) / sum(1 / array)
}

# Function to calculate the geometric mean
geometricMean <- function(array){
 if(!is.numeric(array)){
 stop("Passed argument must be an array. Consider using sapply for data frames.")
 }
 if(any(array<0)){
 stop("All values must be greater than zero. If you are attempting to
 apply this function to rates, convert to +1 format. For example,
 5% becomes 1.05 and -20% becomes .8.")
 }
 prod(array)^(1/length(array))
}

# Function to capture the three means based on the sample
fetchMeans <- function(sample){
 #Passed data frame with n number of rows and 2 columns (values and obs)
 arithmetic <- mean(sample$value)
 harmonic <- harmonicMean(sample$value)
 geometric <- geometricMean(sample$value)
 results <- data.frame(arithmetic, harmonic, geometric)

 return(results)
}

##### Graphs #####
# Color Scheme
ealred <- "#7D110C"
ealtan <- "#CDC4B6"
eallighttan <- "#F7F6F0"
ealdark <- "#423C30"
ealorange <- "#BB681C"
ealgreen <- "#3e4525"
ealblue <- "#25516d"

# Function that plots the three means for comparison, called below
plot.means <- function(sample) {
 # First calculate the various means and then flatten to a data frame that
 # can be plotted with ggplot2
 results <- fetchMeans(sample)
 results.melted <- melt(results, variable.name="Type", value.name="Mean")

 g <- ggplot(sample, aes(x=obs, y=value)) + geom_bar(stat="identity", alpha=1, fill=ealtan) +
 geom_hline(data=results.melted, aes(yintercept=Mean, color=Type), show_guide=TRUE, size=1) +
 scale_color_manual(name="Type of Mean",
 values=c(ealred, ealorange, ealblue),
 breaks=c("arithmetic", "harmonic", "geometric"),
 labels=c(paste("Arithmetic: ", round(results$arithmetic, digits=2)),
 paste("Harmonic: ", round(results$harmonic, digits=2)),
 paste("Geometric: ", round(results$geometric, digits=2)))) +
 scale_x_discrete(breaks=NULL) +
 labs(x="Observations", y=NULL) +
 theme(panel.background=element_rect(fill=eallighttan))
 return(g)
}
#### Comparison with Normally Distributed Sample ####

# First generate 'random' set of n numbers with mean of m. These will be normally
# distributed so you expect arithmetic mean, harmonic mean, and geometric
# mean to be fairly consistent.
n <- 25
m <- 5
sample <- data.frame("value"=rnorm(n=n, mean=m))
sample$obs <- rownames(sample)

# Next plot the three means, compared with the sample population
g <- plot.means(sample)
g <- g + ggtitle("Mean Comparison with\nNormally Distributed Sample")
g
# ggsave("test.png")

#### Comparison based on Sample with an Outlier
# Add a few outliers to distort the population
sample.outliers <- sample
sample.outliers[n-2, 1] <- m^2.5

g.outlier <- plot.means(sample.outliers)
g.outlier <- g.outlier + ggtitle("Mean Comparison using\nSample with Outliers")
g.outlier