Igor Bubelov

Plotting Financial Data in Python

Programming · Finance · Feb 23, 2020

Economics is one of my so called “practical hobbies”. It’s practical because it has a measurable positive impact on my life and it’s a hobby because it’s not what I do professionally so I advise you to take my words with a grain of salt.

I need to plot some financial data from time to time and it’s really hard to find a good and reasonably priced service which would have all of the data and customization options I often need. Fortunately, it’s not that hard to draw customized charts by yourself, especially with the help of the great tools such as Python and Matplotlib. This is a step-by-step guide which can help you to get started.

Source Code

You can get the full source code here:

https://github.com/bubelov/market-plots

Picking The Right Tools

There are 3 choices that have to be made before we start coding:

1. Choose a data source
2. Choose a programming language
3. Choose a visualization library

Choosing a Data Source

There are many financial data sources but most of them aren’t free. There is nothing wrong with paying for high quality data but let’s start with a free API called AlphaVantage. I use it in this article but you can pick any other data source, it shouldn’t affect most of the examples.

Programming Language

I chose Python because it’s frequently used for such tasks which means there should be plenty of mature tools available. It’s also a really nice language for writing small utilities or integrating different parts of a big system. It can fail miserably in projects of substantial complexity but there is nothing to worry about if you just need to plot some data.

Plotting Library

There are plenty of libraries for plotting data. Here is the list from the Python wiki: Plotting. I chose Matplotlib since it’s widely adopted and it has everything that I need.

Getting Started

This post assumes that you have Python 3 installed. If you have older version of Python you’re going to need some code adjustments.

Let’s create our project folder and give it a sensible name, such as market-plots:

mkdir market-plots


It’s a good practice to isolate our little project from the rest of the system so we won’t mess with the global package registry.

Here is how we can create a local environment for a scope of this project:

cd market-plots
python3 -m venv venv


This command will create a folder called venv which will contain our project-scoped dependencies.

Let’s activate our virtual environment:

. venv/bin/activate


You can always exit your virtual environment by executing this command:

deactivate


Now, let’s create a text file where we’ll list all of the needed dependencies. By convention, it should be called requirements.txt:

printf '%s\n' matplotlib >> requirements.txt
printf '%s\n' requests >> requirements.txt
printf '%s\n' python-dotenv >> requirements.txt


So, we ended up with a file named requirements.txt which has the following contents:

matplotlib
requests
python-dotenv


We can install all of those dependencies by executing the following command:

pip install -r requirements.txt


Now we have everything ready so we can start coding.

Getting Alpha Vantage API key

You can obtain a free API key here and add it to your project by executing the following command:

echo ALPHA_VANTAGE_KEY=YOUR_API_KEY > .env


Plotting Asset Price History

There are many reasons why people are interested in asset price history. Some of those reasons are rational and some are pretty bogus. Luckily for us, price history is very easy to plot and we can also verify the corectness of a result just by looking at it so it makes it a good warm-up excercise to get us comfortable with our new tools.

API Wrapper

Let’s create a new file and call it alpha_vantage.py. It will wrap TIME_SERIES_ list of functions, you can check the AlphaVantage API documentation for more details.

from dotenv import load_dotenv
from os.path import join, dirname
from dateutil import parser
from enum import Enum
from typing import List
import os
import urllib.request as url_request
import json
from dataclasses import dataclass
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

dotenv_path = join(dirname(__file__), '.env')
load_dotenv(dotenv_path)

API_KEY = os.getenv('ALPHA_VANTAGE_KEY')
REQUEST_TIMEOUT_SECONDS = 20

class Interval(Enum):
DAILY = 'DAILY'
WEEKLY = 'WEEKLY'
MONTHLY = 'MONTHLY'

@dataclass
class AssetPrice:
date: str
price: float

def get_stock_price_history(symbol: str,
interval: Interval,
adjusted=False) -> List[AssetPrice]:
url = url_for_function('TIME_SERIES_%s' % interval.value)

if adjusted == True:
url += '_ADJUSTED'

url += '&apikey=%s' % API_KEY
url += '&symbol=%s' % symbol
url += '&outputsize=full'

response = url_request.urlopen(url, timeout=REQUEST_TIMEOUT_SECONDS)
data = json.load(response)
prices_json = data[list(data.keys())[1]]

field_name = '4. close' if adjusted == False else '5. adjusted close'

prices: List[AssetPrice] = []

for k, v in sorted(prices_json.items()):
prices.append(AssetPrice(date=parser.parse(k),
price=float(v[field_name])))

return prices

def url_for_function(function: str):
return f'https://www.alphavantage.co/query?function={function}'


The logic inside this file is quite straightforward. Let’s examine all of the arguments to understand how to use this function properly:

• symbol - it’s just a stock trading symbol such as GOOG, TSLA and so on
• interval - sampling interval, you can use DAILY, WEEKLY or MONTHLY intervals
• adjusted - whether to use an absolute price or adjust for stock splits and dividends

Price History

Now since we have a financial API wrapper, let’s use it to plot stock price history. We need to add a new file called history.py which should contain the following code:

import sys
import pathlib
import matplotlib.pyplot as plt

import alpha_vantage
from alpha_vantage import Interval
import plot_style

def show_history(symbol: str):
data = alpha_vantage.get_stock_price_history(
symbol,
Interval.MONTHLY,
adjusted=False
)

plot_style.line()
plt.title(f'{symbol.upper()} Price History')
plt.plot(
list(i.date for i in data),
list(i.price for i in data))

pathlib.Path('img/history').mkdir(parents=True, exist_ok=True)
plt.savefig(f'img/history/{symbol.lower()}.png')
plt.close()

show_history(sys.argv[1])


Let’s test this script by giving it a real query. We can lookup the S&P 500 index history by passing it’s symbol (SPX) as an argument to our new script:

python history.py spx


You should see the chart similar to this one:

Conclusion

We’ve created a simple wrapper that allows us to query stock price history and used it to plot the data on screen. Next, we’ll use this data to show the risk of different stocks.

Plotting Risk as Variance

Variance is an important indicator if you want to know the level of risk associated with holding a given security. It’s important to understand that past variance might not be a good predictor of future variance but most of the time it works and we don’t have other options anyway. Let’s create a script for displaying returns distribution, variance and standard deviation of any given security.

Finding Variance

It’s super easy to find a variance if you have the returns data. Here is how to do that:

1. Find mean returns (mean of all data points).
2. For each data point, subtract it’s value from the mean returns and square the result of subtraction (note that we made it impossible to have a negative result since no number in the power of 2 can be negative).
3. Sum all of the results and divide this sum by the number of data points.

That’s all, now you have a variance. If you also want to find a standard deviation, just take the square root of a variance value.

Obtaining The Data

Let’s extend our alpha_vantage.py module to add one more function:

def get_stock_returns_history(symbol: str,
interval: Interval) -> [float]:
price_history = get_stock_price_history(symbol, interval, adjusted=True)

returns: [float] = []
prev_price = None

for item in price_history:
if prev_price != None:
returns.append((item.price - prev_price) / prev_price)

prev_price = item.price

return returns


This data is based on the price history data but now we’re not interested in absolute numbers so we have to calculate relative changes and return them as a simple array. For instance, with a MONTHLY interval this array would contain the month to month changes in the price of a given security.

Plotting The Data

Let’s create a new file and call it variance.py, it should contain the following code:

import sys
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pathlib
import matplotlib.style as style

import alpha_vantage
from alpha_vantage import Interval
import plot_style

def show_variance(symbol, interval=Interval.MONTHLY):
returns = alpha_vantage.get_stock_returns_history(symbol, interval)
variance = np.var(returns)
standard_deviation = np.sqrt(variance)
mean_return = np.mean(returns)

plot_style.hist()

n, _, patches = plt.hist(returns, density=True, bins=25)

for item in patches:
item.set_height(item.get_height() / sum(n))

max_y = max(n) / sum(n)
plt.ylim(0, max_y + max_y / 10)

plt.gca().set_xticklabels(['{:.0f}%'.format(x*100)
for x in plt.gca().get_xticks()])

plt.gca().set_yticklabels(['{:.0f}%'.format(y*100)
for y in plt.gca().get_yticks()])

title_line_1 = f'{symbol.upper()} {interval.value.lower().capitalize()} Return Distribution'
title_line_2 = 'Standard Deviation = %.2f%% Mean Return = %.2f%%' % (
standard_deviation * 100, mean_return * 100)
plt.title(f'{title_line_1}\n{title_line_2}')
plt.xlabel(f'{interval.value.lower().capitalize()} Return')
plt.ylabel('Probability')

pathlib.Path('img/variance').mkdir(parents=True, exist_ok=True)
plt.savefig(f'img/variance/{symbol}.png')
plt.close()

show_variance(sys.argv[1])


First of all, this script checks the number of input parameters, we need this check to find out whether we have a period specified or should we use the default value (MONTHLY). Next, this code fetches the returns history data and calculates the variance based on that data. The final step is to plot the returns distribution as a histogram so we can see the relative frequencies of any given returns.

Testing

Let’s test our new module by requiring it to draw a couple of charts:

$python variance.py tsla  You should see the chart similar to this one: Now let’s check the variance of S&P 500 (SPX) index: $ python variance.py spx

It’s easy to see why TSLA is more risky to hold than SPX but it does not mean that it’s a bad choice. Why does anyone want to hold such an unpredictable stock? You can look at the mean returns or plot the price history to see the answer.

Conclusion

Now that we can get a hint of the risk of holding any particular security, it would be nice to have a way of comparing the returns. It might be a good idea to hold a risky security if it gives exceptional returns and you are not worried about price volatility in the short term.

Comparing Asset Returns

Standard Deviation

Standard deviation of a portfolio is just a square root of it’s variance:

$$σ_p = (σ_p^2)^{1 \over 2}$$

That gives us a hint about the portfolio riskiness.

Implementation

Let’s create a new file and call it frontier.py:

import matplotlib.pyplot as plt
import sys
import pathlib
import numpy as np

import alpha_vantage
from alpha_vantage import Interval
import plot_style

def show_frontier(symbols, interval=Interval.MONTHLY):
#print(f'Symbols: {symbols}')

returns_history = dict()

min_length = None

for symbol in symbols:
history = alpha_vantage.get_stock_returns_history(symbol, interval)
#print(f'Fetched {len(history)} records for symbol {symbol}')

if min_length == None:
min_length = len(history)

if (len(history) < min_length):
min_length = len(history)

returns_history[symbol] = history

#print(f'Min hisotry length = {min_length}')

for symbol in symbols:
returns_history[symbol] = returns_history[symbol][-min_length:]

# for symbol in symbols:
#    print(
#       f'History for symbol {symbol} has {len(returns_history[symbol])} records')

mean_returns = dict()
variances = dict()
standard_deviations = dict()

for symbol in symbols:
history = returns_history[symbol]
history_length = len(history)
#print(f'Return history for symbol {symbol} has {history_length} records')
mean_returns[symbol] = np.mean(history)
variances[symbol] = np.var(history)
standard_deviations[symbol] = np.sqrt(variances[symbol])

plot_style.scatter()

portfolio_returns = []
portfolio_deviations = []

for i in range(0, 1_000):
randoms = np.random.random_sample((len(symbols),))
weights = [random / sum(randoms) for random in randoms]

expected_return = sum([weights[i] * mean_returns[symbol]
for i, symbol in enumerate(symbols)])

weights_times_deviations = [
weights[i]**2 * standard_deviations[s]**2 for i, s in enumerate(symbols)]
variance = sum(weights_times_deviations)

for i in range(0, len(symbols)):
for j in range(0, len(symbols)):
if (i != j):
symbol1 = symbols[i]
symbol2 = symbols[j]
#print('Pair = %s %s' % (symbol1, symbol2))

weight1 = weights[i]
weight2 = weights[j]
#print('Weights = %s %s' % (weight1, weight2))

deviation1 = standard_deviations[symbol1]
deviation2 = standard_deviations[symbol2]
#print('Deviations = %s %s' % (deviation1, deviation2))

correlation = np.corrcoef(
returns_history[symbol1], returns_history[symbol2])[0][1]
#print('Correlation = %f' % correlation)

additional_variance = weight1 * weight2  \
* deviation1 * deviation2 \
* correlation
#print('Additional variance = %f' % additional_variance)

variance += additional_variance

standard_deviation = np.sqrt(variance)
#print('Portfolio expected return = %f' % expected_return)
#print('Portfolio standard deviation = %f' % standard_deviation)

plt.scatter(standard_deviation, expected_return, color='#007bff')

portfolio_returns.append(expected_return)
portfolio_deviations.append(standard_deviation)

x_padding = np.average(portfolio_deviations) / 25
plt.xlim(min(portfolio_deviations) - x_padding,
max(portfolio_deviations) + x_padding)

y_padding = np.average(portfolio_returns) / 25
plt.ylim(min(portfolio_returns) - y_padding,
max(portfolio_returns) + y_padding)

plt.gca().set_xticklabels(['{:.2f}%'.format(x*100)
for x in plt.gca().get_xticks()])
plt.gca().set_yticklabels(['{:.2f}%'.format(y*100)
for y in plt.gca().get_yticks()])

plt.title(f'Efficient Frontier {list(s.upper() for s in symbols)}')

plt.xlabel(f'Risk ({interval.value.lower().capitalize()})')
plt.ylabel(f'Return ({interval.value.lower().capitalize()})')

pathlib.Path('img/frontier').mkdir(parents=True, exist_ok=True)
plt.savefig(f'img/frontier/frontier.png')
plt.close()

show_frontier(sys.argv[1:])

Testing

Now, let’s run our new script in order to see the efficient frontier:

python frontier.py ibm dis ko


You should see the following image:

Conclusion

Now we are able to plot the efficient frontier based on an arbitrary number of assets. Please note that nothing is “for sure” in the world of investing and this model has a lot of limitations, although it’s probably one of the best models that are currently available. Our expected return is based purely on the past performance which might not be an accurate assumption about the future.

Another thing to consider is the limit of diversification. The benefits of having more assets tend to wear off with each new asset added to your portfolio. There is a huge difference between the 2-asset and 10-asset portfolios but there might be no gain in having 200 assets, especially if you take into account all of the transaction costs of rebalancing your portfolio.