Plotting Financial Data With Python: Efficient Frontier (N assets)

We already have the efficient frontier script that we created in the previous post but it has one major limitation: it does not allow us to plot more than 2 assets. Plotting 2 assets is enough to see diversification in action but it’s not practical to have a portfolio that consists of 2 assets. In this post we’re going to extend the previous script in order to support an arbitrary number of assets.

This is the fifth part of the “Plotting Financial Data With Python” series and it’s better if you read it in chronological order:

  1. Part 1 - History
  2. Part 2 - Variance
  3. Part 3 - Comparing Returns
  4. Part 4 - Efficient Frontier (2 Assets)
  5. Part 5 - Efficient Frontier: (N assets) (you are here)

Why Diversify?

Diversification helps to reduce portfolio volatility but to what extent? Well, it depends on the correlations between different assets but we can safely assume that the number of assets should be greater than 2. If you decide to add another asset, the smaller the number of assets you already have in your portfolio, the better the effect of diversification. Here is the picture that helps to visualize how the number of assets affects the portfolio risk:

Portfolio standard deviation versus number of securities

As you can see, one thing is clear: having 2 assets does not allow us to get all of the benefits of diversification. There are many opinions on what number of assets is “right” but almost everyone agrees that 2 is far too low.

Goal

We already calculated the efficient frontier for a portfolio that consists of the IBM and DIS stocks. Let’s add one more stock to it. You can pick any stock or an index but I will go with Coca-Cola (KO).

So, how do we calculate our risks and rewards?

Expected Return

Here is how we can calculate the expected return on a portfolio:

\( E(R_p) = \sum_{i=1}^N w_i E(R_i) \)

Where:

\( R_p \) = expected return on a portfolio

\( N \) = number of assets in a portfolio

\( w_i \) = weight of an asset i in a portfolio

\( R_i \) = expected return on asset i

All of this is pretty simple, we just need to find the weighted average of the returns of every asset in a portfolio.

Variance

Variance is a bit more tricky to calculate because we have to include the correlations between each pair of assets:

\( σ_p^2 = \sum_{i=1}^N w_i^2 σ_i^2 + \sum_{i=1}^N \sum_{j \not = i}^N w_i w_j σ_i σ_j p_{ij} \)

Where:

\( σ_p^2 \) = portfolio volatility

\( w_i \) = weight of an asset i in a portfolio

\( σ_i \) = standard deviation of an asset i

\( p_{ij} \) = correlation of returns between the assets i and j

Standard Deviation

Standard deviation of a portfolio is just a square root of it’s variance:

\( σ_p = (σ_p^2)^{1 \over 2} \)

That gives us a hint about the portfolio riskiness.

Implementation

Let’s create a new file and call it frontier.py:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
import sys
import numpy as np
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

import alpha_vantage

def show_frontier(symbols, interval='MONTHLY'):
    print('Symbols: %s' % symbols)

    returns_history = dict()

    min_length = None

    for symbol in symbols:
        history = alpha_vantage.get_stock_returns_history(symbol, interval)
        print('Fetched %i records for symbol %s' % (len(history), symbol))

        if min_length == None:
            min_length = len(history)

        if (len(history) < min_length):
            min_length = len(history)

        returns_history[symbol] = history

    print('Min hisotry length = %i' % min_length)

    for symbol in symbols:
        returns_history[symbol] = returns_history[symbol][-min_length:]

    for symbol in symbols:
        print('History for symbol %s has %i records' % (symbol, len(returns_history[symbol])))

    mean_returns = dict()
    variances = dict()
    standard_deviations = dict()

    for symbol in symbols:
        history = returns_history[symbol]
        print('Return history for symbol %s has %i records' % (symbol, len(history)))
        mean_returns[symbol] = np.mean(history)
        variances[symbol] = np.var(history)
        standard_deviations[symbol] = np.sqrt(variances[symbol])

    portfolio_returns = []
    portfolio_deviations = []

    for i in range(0, 1_000):
        randoms = np.random.random_sample((len(symbols),))
        weights = [ random / sum(randoms) for random in randoms ]

        expected_return = sum([ weights[i] * mean_returns[symbol] for i, symbol in enumerate(symbols) ])

        weights_times_deviations = [ weights[i]**2 * standard_deviations[symbol]**2 for i, symbol in enumerate(symbols) ]
        variance = sum(weights_times_deviations)

        for i in range(0, len(symbols)):
            for j in range(0, len(symbols)):
                if (i != j):
                    symbol1 = symbols[i]
                    symbol2 = symbols[j]
                    #print('Pair = %s %s' % (symbol1, symbol2))

                    weight1 = weights[i]
                    weight2 = weights[j]
                    #print('Weights = %s %s' % (weight1, weight2))

                    deviation1 = standard_deviations[symbol1]
                    deviation2 = standard_deviations[symbol2]
                    #print('Deviations = %s %s' % (deviation1, deviation2))

                    correlation = np.corrcoef(returns_history[symbol1], returns_history[symbol2])[0][1]
                    #print('Correlation = %f' % correlation)

                    additional_variance = weight1 * weight2 * deviation1 * deviation2 * correlation
                    #print('Additional variance = %f' % additional_variance)

                    variance += additional_variance

        standard_deviation = np.sqrt(variance)
        #print('Portfolio expected return = %f' % expected_return)
        #print('Portfolio standard deviation = %f' % standard_deviation)

        plt.scatter(standard_deviation, expected_return, color='blue')

        portfolio_returns.append(expected_return)
        portfolio_deviations.append(standard_deviation)

    x_padding = np.average(portfolio_deviations) / 25
    plt.xlim(min(portfolio_deviations) - x_padding, max(portfolio_deviations) + x_padding)

    y_padding = np.average(portfolio_returns) / 25
    plt.ylim(min(portfolio_returns) - y_padding, max(portfolio_returns) + y_padding)

    plt.gca().set_xticklabels(['{:.2f}%'.format(x*100) for x in plt.gca().get_xticks()])
    plt.gca().set_yticklabels(['{:.2f}%'.format(y*100) for y in plt.gca().get_yticks()])

    plt.title('Efficient Frontier %s' % symbols)

    plt.xlabel('Risk')
    plt.ylabel('Return')

    plt.show()

show_frontier(sys.argv[1:])

Testing

Now, let’s run our new script in order to see the efficient frontier:

python frontier.py IBM DIS KO

You should see the following image:

Efficient Frontier

Conclusion

Now we are able to plot the efficient frontier based on an arbitrary number of assets. Please note that nothing is “for sure” in the world of investing and this model has a lot of limitations, although it’s probably the best model that is currently available. Our expected return is based purely on the past performance which might not be an accurate assumption about the future.

Another thing to consider is the limit of diversification. The benefits of having more assets tend to wear off with each new asset added to your portfolio. There is a huge difference between the 2-asset and 10-asset portfolios but there might be no gain in having 200 assets, especially if you take into account all of the transaction costs of rebalancing your portfolio.