Data Analysis, Programming

Portfolio Management using Python — Portfolio Optimization

A guide to knowing about portfolio optimization and implementing it through the Python language.

Jayashree domala

Published in

Towards AI

8 min readFeb 27, 2021

What is portfolio optimization?

Portfolio optimization is the process of choosing the best portfolio among the set of all portfolios.

How is portfolio optimization done?

The naive way is to select a group of random allocations and figure out which one has the best Sharpe Ratio. This is known as the Monte Carlo Simulation where randomly a weight is assigned to each security in the portfolio and then the mean daily return and standard deviation of daily return is calculated. This helps in calculating the Sharpe Ratio for randomly selected allocations.

To know more about Sharpe Ratio, check out my previous article:

Portfolio management using Python: Part 1 — portfolio allocation

A guide to knowing about portfolio allocation and implementing it through the Python language.

jayashree8.medium.com

But the naive way is time taking so an optimization algorithm is used which works on the concept of the minimizer. The higher the Sharpe Ratio, the higher is the risk-adjusted return and the better the portfolio selection. So the intuition is to maximize the Sharpe Ratio meaning that the optimizer should minimize the negative Sharpe Ratio. The optimization algorithm will allocate optimal weights to the portfolio on the basis of the Sharpe Ratio.

How to implement using Python?

→ Import packages

The basic packages like pandas, numpy and matplotlib are imported. The quandl package is imported for obtaining data.

>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> import quandl
>>> %matplotlib inline

→ Data

The start and end date are decided during which the data will be extracted and worked on.

>>> start = pd.to_datetime('2014-01-01')
>>> end = pd.to_datetime('2016-01-01')

Then the data of 3 companies: Facebook, Apple and Twitter are extracted. The 11th column which is adjusted close is obtained since the analysis will be done on it. Then the 3 stocks are combined into a single dataframe.

>>> fb = quandl.get('WIKI/FB.11',start_date=start, end_date=end)
>>> twtr = quandl.get('WIKI/TWTR.11',start_date=start, end_date=end)
>>> aapl = quandl.get('WIKI/AAPL.11',start_date=start, end_date=end)>>> df = pd.concat([fb,twtr,aapl],axis=1)
>>> df.columns = ['Facebook','Twitter','Apple']>>> df.head()

→ Statistics

1) Mean daily return

Using the percent change function, the mean daily return of the stocks can be found.

>>> df.pct_change(1).mean()Facebook    0.001480
Twitter    -0.001520
Apple       0.000762
dtype: float64

2) Correlation between stocks

Using the mean daily return, the correlation between the stocks can be determined.

>>> df.pct_change(1).corr()

3) Arithmetic return

The percentage change method gives the arithmetic return.

>>> arith_returns = df.pct_change(1)
>>> arith_returns.head()

4) Logarithmic return

The log function gives the logarithmic return.

>>> log_returns = np.log(df/df.shift(1))
>>> log_returns.head()

To understand the difference between arithmetic and logarithmic returns and why log-returns are used, check the article below:

Arithmetic vs. Logarithmic Rates of Return

Say you hold a stock as it increases from $100 to $105. Usually, this is reported as a return of 5%. The formula for…

chandlerfang.com

5) Covariance

The covariance function gives the covariance.

>>> log_returns.cov()

Optimization method — Randomization

→ Allocation of random weights to the stocks

The weights are randomly allocated to the stocks and made sure that they all add up to 1. To make sure they add up to 1, the weights are rebalanced by dividing it by the sum of weights.

>>> print('The stocks are: ',df.columns)
>>> np.random.seed(200)
>>> weights = np.array(np.random.random(3))
>>> weights = weights/np.sum(weights)
>>> print('Random weights: ',weights)The stocks are:  Index(['Facebook', 'Twitter', 'Apple'], dtype='object')
Random weights:  [0.53580931 0.12809422 0.33609646]

→ Finding expected return

The expected return is found in terms of log. The returns are multiplied with the weights and then multiplied with the number of working days which is 252.

>>> expected_return = np.sum((log_returns.mean()* weights) * 252)
>>> expected_return0.16004629405548906

→ Finding expected variance

Expected variance indicates volatility. The covariance for 252 days is multiplied with the weights and then the dot product is taken with the weights. Then finally the square root is taken.

>>> expected_vol = np.sqrt(np.dot(weights.T,np.dot(log_returns.cov()*252,weights)))
>>> expected_vol0.24233721586475374

→ Finding Sharpe Ratio

The Sharpe ratio is the division of the expected returns and expected volatility.

>>> sharpe_r = expected_return/expected_vol
>>> sharpe_r0.6604280464491656

The entire process of allocating random weights to finding the Sharpe ratio was done just once. But ideally, we need to repeat the process multiple times with different combinations of weights to find the optimal allocations. So the entire process is run multiple times using a ‘for’ loop.

>>> np.random.seed(200)# Initalization of variables
>>> portfolio_number = 7000
>>> weights_total = np.zeros((portfolio_number,len(df.columns)))
>>> returns = np.zeros(portfolio_number)
>>> volatility = np.zeros(portfolio_number)
>>> sharpe = np.zeros(portfolio_number)>>> for i in range(portfolio_number):
        # Random weights
        weights = np.array(np.random.random(3))
        weights = weights/np.sum(weights)
        # Append weight
        weights_total[i,:] = weights        # Expected return
        returns[i] = np.sum((log_returns.mean()* weights) * 252)        # Expected volume
        volatility[i] = np.sqrt(np.dot(weights.T,
        np.dot(log_returns.cov()*252,weights)))        # Sharpe ratio
        sharpe[i] = returns[i]/volatility[i]

The maximum value in the Sharpe ratio array is found along with its index. The maximum value of the Sharpe ratio indicates the best combination of the allocations.

>>> max_sharpe = sharpe.max()
>>> max_sharpe
1.091492698172807>>> max_sharpe_index = sharpe.argmax()
>>> max_sharpe_index
343

The weights, return and volatility are found at the index where the Sharpe ratio is maximum.

>>> max_sharpe_weights = weights_total[343,:]
>>> max_sharpe_weights
array([6.71707148e-01, 2.83931652e-04, 3.28008920e-01])>>> max_sharpe_return = returns[max_sharpe_index]
>>> max_sharpe_return
0.2714466221609141>>> max_sharpe_vol = volatility[max_sharpe_index]
>>> max_sharpe_vol
0.24869302617903377

→ Plot

The return is plotted against volatility on the basis of the Sharpe ratio obtained which outputs a bullet-shaped plot. Along with this, the point with the highest Sharpe ratio is plotted with a red dot.

>>> plt.figure(figsize=(12,8))
>>> plt.scatter(volatility,returns,c=sharpe)
>>> plt.colorbar(label='Sharpe Ratio')
>>> plt.xlabel('Volatility')
>>> plt.ylabel('Return')>>> plt.scatter(max_sharpe_vol,max_sharpe_return,c='red',s=50)

Optimization method — Mathematical

A function is defined which takes in the weights and returns the returns, volatility and Sharpe ratio.

>>> def stats(weights):
        weights = np.array(weights)
        expected_return = np.sum((log_returns.mean()* 
        weights) * 252)
        expected_vol = np.sqrt(np.dot(weights.T,
        np.dot(log_returns.cov()*252,weights)))
        sharpe_r = expected_return/expected_vol
        return np.array([expected_return,expected_vol,sharpe_r])

The python SciPy module will be used to create the mathematical optimization function.

>>> from scipy.optimize import minimize

The optimization idea is to minimize the negative Sharpe ratio. So a function is created which negates the Sharpe ratio obtained using the function created above.

>>> def sr_negate(weights):
        neg_sr = stats(weights)[2] * -1
        return neg_sr

Next, a condition has to be checked that the sum of the weights equals 1. So for this also a function is created. The weights are summed and then subtracted with 1. If the value returned is 0 then the sum is 1.

>>> def weight_check(weights):
        weights_sum = np.sum(weights)
        return weights_sum - 1

The next thing to do is define constraints, bounds and the initial guess. The constraints take in a tuple of dictionary where the first element describes the type to be an equation (since the ‘weight_check’ function returns a value by solving an equation) and the second element describes the function as ‘weight_check’. Since the weights can be in the range of 0 to 1, the bounds are set accordingly and the initial guess is also mentioned.

>>> constraints = ({'type':'eq','fun':weight_check})
>>> bounds = ((0,1),(0,1),(0,1))
>>> initial_guess = [0.3,0.3,0.4]

Now all these variables are passed through the minimization function of scipy. It takes the function which has to be minimized, followed by the initial guess, then the solver which is used to carry out the minimization. The solver used will be the Sequential Least-Squares Programming (SLSQP) one as it is used for the most basic tasks. Then the bounds, constraints are also mentioned.

>>> results = minimize(sr_negate,initial_guess,method='SLSQP',
    bounds=bounds,constraints=constraints)
>>> resultsfun: -1.0927303413720688
     jac: array([ 1.26287341e-04,  2.89727578e+00, -2.90006399e-04])
 message: 'Optimization terminated successfully'
    nfev: 20
     nit: 5
    njev: 5
  status: 0
 success: True
       x: array([6.96661817e-01, 3.01299784e-15, 3.03338183e-01])

Now the value ‘x’ contains the optimized allocation of weights which is extracted and then passed through the ‘stats’ function which returns the return, volatility and Sharpe ratio.

>>> wt = results.x
>>> wt
array([6.96661817e-01, 3.01299784e-15, 3.03338183e-01])>>> stats(wt)
array([0.27569923, 0.25230308, 1.09273034])

→ Plot efficient frontier

The efficient frontier is the set of optimal portfolios that have the highest expected return for the lowest risk for a given level of expected return.

The frontier level (return level) is set which takes in values from -0.6 to 0.4 for 200 points since the returns on the previously obtained bullet-shaped graph have the same range.

>>> frontier_return = np.linspace(-0.6,0.4,200)

Then next for the volatility has to be minimized a function is defined for it which will be used later while defining the constraints.

>>> def min_vol(weights):
        vol = stats(weights)[1]
        return vol

Now next thing to do is iterate through the frontier that is the returns and for each return calculate the minimum volatility by calling the scipy minimize function. The constraints are defined where the weights are checked and the difference of return and expected return is found. As before the inital guess, method and bounds remain the same. The minimize function returns a function value which is the minimum volatility and it will be saved in an array.

>>> frontier_volatility = []>>> for exp_return in frontier_return:
        constraints = ({'type':'eq','fun':weight_check},{'type':'eq','fun':lambda x: stats(x)[0]-exp_return})
        result = minimize(min_vol,initial_guess,method='SLSQP',
        bounds=bounds,constraints=constraints)
        frontier_volatility.append(result['fun'])

Now using the frontier returns and volatility array, the efficient frontier can be drawn along with the initial guesses obtained using the bullet-shaped graph before.

>>> plt.figure(figsize=(12,8))
>>> plt.scatter(volatility,returns,c=sharpe)
>>> plt.colorbar(label='Sharpe Ratio')
>>> plt.xlabel('Volatility')
>>> plt.ylabel('Return')>>> plt.plot(frontier_volatility,frontier_return,'r--',linewidth=3)

The efficient frontier tells us that for each desired level of volatility what is the best possible return that can be obtained.

Refer to the notebook here.

Reach out to me: LinkedIn
Check out my other work: GitHub