title: "Case Study: Stock Portfolio Allocation, Part 2" subtitle: PSTAT 234 (Fall 2025) format: clean-revealjs: bibliographystyle: "chicago-author-date" fig-width: 30 code-copy: hover code-line-numbers: true author:

  • name: Sang-Yun Oh affiliations: "University of California, Santa Barbara" date: "February 27, 2025" date-format: long bibliography: [../static/references.bib] execute: echo: true code-fold: true editor: render-on-save: true

In [1]:
#| echo: false

import pandas as pd
import numpy as np
import seaborn as sns
sns.set_theme(style="whitegrid")

import cvxpy as cvx
import matplotlib.pyplot as plt

Portfolio Allocation: Dow Jones component stocks¶

Stock data can have irregularities such as missing data due stocks being added and removed from the index. Some examples are

  • Alcoa Corp. (AA) was removed in 2013
  • Apple (AAPL) was added in 2015
  • E.I. du Pont de Nemours & Company (DD) was removed and replaced with Dow du Pont (DWDP) as a continuation in 2017

Stock Symbols¶

For simplicity the stocks we will use are based on the most recent DJIA constituent companies

In [2]:
#| echo: false
dowjones_components = [
    'AAPL','AXP','BA','BAC','CAT',
    'CSCO','CVX','DD','DIS','GE',
    'HD','HPQ','IBM','INTC','JNJ',
    'JPM','KO','MCD','MMM',
    'MRK','MSFT','PFE','PG','T',
    'TRV','VZ','WMT','XOM'
]

symbols = pd.read_csv('data/tickers.csv').set_index('code')
symbols['name'] # full company names
Out[2]:
code
AAPL                              Apple Inc (AAPL)
AXP                     American Express Co. (AXP)
BA                             The Boeing Co. (BA)
BAC                    Bank of America Corp. (BAC)
CAT                         Caterpillar Inc. (CAT)
CSCO                     Cisco Systems Inc. (CSCO)
CVX                      Chevron Corporation (CVX)
DD            E.I. du Pont de Nemours and Co. (DD)
DIS                  Disney (Walt) Co. (The) (DIS)
GE                        General Electric Co (GE)
HD                             Home Depot Inc (HD)
HPQ                       Hewlett-Packard Co (HPQ)
IBM     International Business Machines Corp (IBM)
INTC                             Intel Corp (INTC)
JNJ                        Johnson & Johnson (JNJ)
JPM                          JP Morgan Chase (JPM)
KO                         Coca-Cola Co (The) (KO)
MCD                          McDonald's Corp (MCD)
MMM                                    3M Co (MMM)
MRK                          Merck & Co. Inc (MRK)
MSFT                  Microsoft Corporation (MSFT)
PFE                               Pfizer Inc (PFE)
PG                       Procter & Gamble Co. (PG)
T                                     AT&T Inc (T)
TRV            Travelers Companies Inc (The) (TRV)
UTX                United Technologies Corp. (UTX)
VZ                 Verizon Communications Inc (VZ)
WMT                          Wal-Mart Stores (WMT)
XOM                        Exxon Mobil Corp. (XOM)
Name: name, dtype: object

Load/Import Data from Yahoo Finance¶

In [3]:
#| code-fold: true
import yfinance as yf

store = pd.HDFStore('data/yfinance-data-store.h5')

if 'rawdata' in store:
    rawdata = store['rawdata']
    print('Data loaded from store')
else:
    print('Downloading data from Yahoo Finance')
    print('set auto_adjust=False to get adjusted closing price as separate column')
    rawdata = yf.download(dowjones_components, start="2000-01-01", end="2025-11-01", auto_adjust=True)
    store['rawdata'] = rawdata
Data loaded from store

Preview Data¶

In [4]:
rawdata.head()
Out[4]:
Price Close ... Volume
Ticker AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date
2000-01-03 0.839280 32.323586 25.940268 12.490799 12.574042 35.148087 15.875281 11.853917 22.736227 129.774673 ... 2599386 6265782 53228400 12873345 4275000 7668476 336400 4663843 25109700 13458200
2000-01-04 0.768521 31.103098 25.899942 11.749414 12.412425 33.176201 15.875281 11.529919 24.068050 124.583733 ... 3245705 7894689 54119000 14208974 4270800 9497846 494400 5005878 20235300 14510800
2000-01-05 0.779767 30.313122 27.513647 11.878352 12.703340 33.074562 16.160040 11.904197 25.066923 124.367424 ... 4424482 7963018 64059600 12981591 5098400 12035160 736000 6368681 21056100 17485000
2000-01-06 0.712287 30.930702 27.796047 12.893729 13.349821 32.525688 16.848200 12.239370 24.068050 126.030174 ... 7147057 4989004 54976600 11115273 6524200 9471366 660400 4705763 19633500 19461600
2000-01-07 0.746027 31.381020 28.602884 12.555267 13.786201 34.436577 17.144821 12.513094 23.687531 130.910217 ... 4905035 10871218 62013600 17962163 9832000 7792534 594700 5043907 23930700 16603800

5 rows × 140 columns

In [5]:
rawdata.tail()
Out[5]:
Price Close ... Volume
Ticker AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date
2025-10-27 268.549652 361.670013 223.000000 53.020000 527.070007 71.389999 153.564911 34.050209 112.339996 312.839996 ... 2922300 7226600 18734700 34974900 7775300 86220400 1108100 27487200 16375900 10975400
2025-10-28 268.739471 361.029999 223.330002 52.869999 524.469971 72.620003 152.427628 34.267780 111.650002 309.790009 ... 2448100 8345900 29986700 50152400 5819300 77662500 1341300 30719200 13323000 9535200
2025-10-29 269.438812 358.220001 213.580002 52.580002 585.489990 71.330002 153.386917 34.376568 110.239998 314.279999 ... 3163400 10921900 36023000 65438400 7405500 102715000 1262700 53730200 11986400 12570600
2025-10-30 271.137146 358.880005 200.080002 53.029999 583.150024 72.910004 151.824356 34.087868 111.839996 310.750000 ... 2143700 16678800 41023100 157478000 6760700 82167800 1473500 47371000 14065100 16172900
2025-10-31 270.108154 360.730011 201.020004 53.450001 577.260010 73.110001 155.977966 34.163181 112.620003 308.950012 ... 2096600 12385000 34006400 132697400 7930300 89936800 1900900 52181800 20237000 20213400

5 rows × 140 columns

Hierarchical Indexing¶

  • Column indexing is hierarchical
  • Close, high, low, open, and volume are given for each stock symbol
In [6]:
## Hierarchical Index
rawdata.columns
Out[6]:
MultiIndex([( 'Close', 'AAPL'),
            ( 'Close',  'AXP'),
            ( 'Close',   'BA'),
            ( 'Close',  'BAC'),
            ( 'Close',  'CAT'),
            ( 'Close', 'CSCO'),
            ( 'Close',  'CVX'),
            ( 'Close',   'DD'),
            ( 'Close',  'DIS'),
            ( 'Close',   'GE'),
            ...
            ('Volume',  'MMM'),
            ('Volume',  'MRK'),
            ('Volume', 'MSFT'),
            ('Volume',  'PFE'),
            ('Volume',   'PG'),
            ('Volume',    'T'),
            ('Volume',  'TRV'),
            ('Volume',   'VZ'),
            ('Volume',  'WMT'),
            ('Volume',  'XOM')],
           names=['Price', 'Ticker'], length=140)
  • Column levels can be named:
In [7]:
rawdata.columns = rawdata.columns.set_names(['Value', 'Symbol'])
rawdata.head()
Out[7]:
Value Close ... Volume
Symbol AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date
2000-01-03 0.839280 32.323586 25.940268 12.490799 12.574042 35.148087 15.875281 11.853917 22.736227 129.774673 ... 2599386 6265782 53228400 12873345 4275000 7668476 336400 4663843 25109700 13458200
2000-01-04 0.768521 31.103098 25.899942 11.749414 12.412425 33.176201 15.875281 11.529919 24.068050 124.583733 ... 3245705 7894689 54119000 14208974 4270800 9497846 494400 5005878 20235300 14510800
2000-01-05 0.779767 30.313122 27.513647 11.878352 12.703340 33.074562 16.160040 11.904197 25.066923 124.367424 ... 4424482 7963018 64059600 12981591 5098400 12035160 736000 6368681 21056100 17485000
2000-01-06 0.712287 30.930702 27.796047 12.893729 13.349821 32.525688 16.848200 12.239370 24.068050 126.030174 ... 7147057 4989004 54976600 11115273 6524200 9471366 660400 4705763 19633500 19461600
2000-01-07 0.746027 31.381020 28.602884 12.555267 13.786201 34.436577 17.144821 12.513094 23.687531 130.910217 ... 4905035 10871218 62013600 17962163 9832000 7792534 594700 5043907 23930700 16603800

5 rows × 140 columns

In [8]:
rawdata.columns
Out[8]:
MultiIndex([( 'Close', 'AAPL'),
            ( 'Close',  'AXP'),
            ( 'Close',   'BA'),
            ( 'Close',  'BAC'),
            ( 'Close',  'CAT'),
            ( 'Close', 'CSCO'),
            ( 'Close',  'CVX'),
            ( 'Close',   'DD'),
            ( 'Close',  'DIS'),
            ( 'Close',   'GE'),
            ...
            ('Volume',  'MMM'),
            ('Volume',  'MRK'),
            ('Volume', 'MSFT'),
            ('Volume',  'PFE'),
            ('Volume',   'PG'),
            ('Volume',    'T'),
            ('Volume',  'TRV'),
            ('Volume',   'VZ'),
            ('Volume',  'WMT'),
            ('Volume',  'XOM')],
           names=['Value', 'Symbol'], length=140)
In [9]:
rawdata.columns.levels
Out[9]:
FrozenList([['Close', 'High', 'Low', 'Open', 'Volume'], ['AAPL', 'AXP', 'BA', 'BAC', 'CAT', 'CSCO', 'CVX', 'DD', 'DIS', 'GE', 'HD', 'HPQ', 'IBM', 'INTC', 'JNJ', 'JPM', 'KO', 'MCD', 'MMM', 'MRK', 'MSFT', 'PFE', 'PG', 'T', 'TRV', 'VZ', 'WMT', 'XOM']])

Subset First Level 1¶

  • Subsetting first level of hierarchical indexing: Close
In [10]:
rawdata['Close'].head()
Out[10]:
Symbol AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date
2000-01-03 0.839280 32.323586 25.940268 12.490799 12.574042 35.148087 15.875281 11.853917 22.736227 129.774673 ... 19.343803 25.427525 35.668091 11.650893 27.024214 6.324178 17.456139 15.576514 14.239633 17.255524
2000-01-04 0.768521 31.103098 25.899942 11.749414 12.412425 33.176201 15.875281 11.529919 24.068050 124.583733 ... 18.575180 24.534512 34.463211 11.216841 26.504219 5.954147 17.224714 15.072998 13.706803 16.925014
2000-01-05 0.779767 30.313122 27.513647 11.878352 12.703340 33.074562 16.160040 11.904197 25.066923 124.367424 ... 19.113220 25.498028 34.826580 11.399590 25.999968 6.046654 17.092470 15.576514 13.427073 17.847692
2000-01-06 0.712287 30.930702 27.796047 12.893729 13.349821 32.525688 16.848200 12.239370 24.068050 126.030174 ... 20.650473 25.709532 33.659988 11.810803 27.197546 5.929497 17.423077 15.497396 13.573600 18.770374
2000-01-07 0.746027 31.381020 28.602884 12.555267 13.786201 34.436577 17.144821 12.513094 23.687531 130.910217 ... 21.060410 28.177097 34.099834 12.610372 29.372086 5.980319 18.117361 15.382957 14.599282 18.715279

5 rows × 28 columns

Subset First Level 2¶

  • Subsetting first level of hierarchical indexing: Volume
In [11]:
rawdata.loc[:, 'High':'Low'].head()
Out[11]:
Value High ... Low
Symbol AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date
2000-01-03 0.843498 33.813864 26.908489 12.958197 12.671014 35.859586 16.302419 12.043848 22.783793 132.964966 ... 19.279752 25.004515 34.271967 11.559513 26.646032 6.273719 17.257773 15.495303 13.959902 17.159124
2000-01-04 0.829439 32.040944 26.545424 12.361867 12.864961 34.802485 15.970201 11.753367 24.258312 128.044396 ... 18.575180 24.346507 34.348461 10.965547 26.157553 5.853229 17.026349 14.878089 13.680161 16.856158
2000-01-05 0.828971 31.522557 27.957416 11.975055 12.978094 33.989347 16.432933 11.904197 25.209621 127.179215 ... 18.677664 24.464008 33.468707 11.262521 25.842392 5.979376 17.059409 15.202938 13.253907 17.145353
2000-01-06 0.802260 31.368160 28.038103 12.893729 13.511441 33.135545 16.931254 12.647163 25.209619 127.125146 ... 19.330995 25.451026 33.162739 11.399596 26.488456 5.844790 16.827981 15.268532 13.360470 17.668665
2000-01-07 0.757274 31.599747 28.965967 12.797025 14.254899 34.477234 17.251606 12.596887 24.448575 131.396868 ... 20.483939 26.179553 32.837586 11.810801 27.528451 5.878671 17.621448 15.121398 13.746770 18.508708

5 rows × 56 columns

Subset Second Level 1¶

  • Subsetting second level is slightly harder: AAPL
In [12]:
idx = pd.IndexSlice
rawdata.loc[:, idx[:, 'AAPL']].head()
Out[12]:
Value Close High Low Open Volume
Symbol AAPL AAPL AAPL AAPL AAPL
Date
2000-01-03 0.839280 0.843498 0.762428 0.786328 535796800
2000-01-04 0.768521 0.829439 0.758680 0.811633 512377600
2000-01-05 0.779767 0.828971 0.772269 0.777892 778321600
2000-01-06 0.712287 0.802260 0.712287 0.795700 767972800
2000-01-07 0.746027 0.757274 0.716036 0.723534 460734400

Subset Second Level 2¶

  • Subsetting Open, High, Low, and Close (OHLC) for AAPL:
In [13]:
aapl = rawdata.loc[:, idx['Close':'Open', 'AAPL']]
aapl.head()
Out[13]:
Value Close High Low Open
Symbol AAPL AAPL AAPL AAPL
Date
2000-01-03 0.839280 0.843498 0.762428 0.786328
2000-01-04 0.768521 0.829439 0.758680 0.811633
2000-01-05 0.779767 0.828971 0.772269 0.777892
2000-01-06 0.712287 0.802260 0.712287 0.795700
2000-01-07 0.746027 0.757274 0.716036 0.723534

Subset Second Level 3¶

  • Drop redudant index level, AAPL:
In [14]:
aapl = rawdata.loc[:, idx['Close':'Open', 'AAPL']].droplevel('Symbol', axis=1).tail(60)
aapl.head()
Out[14]:
Value Close High Low Open
Date
2025-08-08 228.868134 230.514661 218.789348 220.366030
2025-08-11 226.959976 229.337676 224.542322 227.699265
2025-08-12 229.427582 230.576477 226.850094 227.789171
2025-08-13 233.104034 234.772415 230.206834 230.846229
2025-08-14 232.554565 234.892296 230.626442 233.833325

Candlestick Chart¶

In [15]:
mmm = rawdata.loc[:, idx['Close':'Open', 'MMM']].droplevel('Symbol', axis=1).tail(60).reset_index()
In [16]:
fig, ax = plt.subplots(figsize=(12, 6))

# width of the candlestick body in days
body_width = 0.6  # adjust (e.g., 0.5 - 0.8) to change bar thickness

for _, r in mmm.iterrows():
  o = r['Open']; c = r['Close']; h = r['High']; l = r['Low']
  color = 'green' if c >= o else 'red'
  x = r['Date']
  ax.vlines(x, l, h, color=color, linewidth=1)
  lower = min(o, c)
  height = abs(c - o)
  ax.add_patch(
    plt.Rectangle(
      (x - pd.Timedelta(days=body_width / 2), lower),
      pd.Timedelta(days=body_width),
      height if height != 0 else 0.0001,  # ensure visible even if open == close
      facecolor=color,
      edgecolor='black',
      linewidth=0.5
    )
  )

ax.set_title('MMM Candlestick (Last 60 Trading Days)')
ax.set_ylabel('Price')
ax.set_xlabel('Date')
ax.xaxis.set_major_locator(plt.MaxNLocator(12))
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
No description has been provided for this image

Closing prices¶

To get closing prices of a few stocks, following command can be used:

In [17]:
first10 = rawdata.loc[:, idx['Close', ['MMM', 'MRK', 'PFE', 'PG', 'JNJ']]].tail(60)
first10.head()
Out[17]:
Value Close
Symbol MMM MRK PFE PG JNJ
Date
2025-08-08 151.942062 79.900742 24.154673 152.443970 172.067017
2025-08-11 153.934250 79.247192 24.154673 153.903763 172.553452
2025-08-12 156.451752 79.514557 24.223461 154.013000 171.521011
2025-08-13 158.780930 81.900978 24.724636 154.330780 173.149063
2025-08-14 155.262375 81.950493 24.675501 152.672379 173.446884
  • Stacking transforms data into long-format:
    Note: unstack does the opposite
In [18]:
healthcare = first10.stack('Symbol', future_stack=True)
healthcare
Out[18]:
Value Close
Date Symbol
2025-08-08 MMM 151.942062
MRK 79.900742
PFE 24.154673
PG 152.443970
JNJ 172.067017
... ... ...
2025-10-31 MMM 165.787628
MRK 85.980003
PFE 24.223461
PG 150.369995
JNJ 188.869995

300 rows × 1 columns

Visualization¶

In [19]:
plt.figure(figsize=(12, 6))
sns.lineplot(data=healthcare, x='Date', y='Close', hue='Symbol')
plt.xticks(rotation=-90)
plt.tight_layout()
plt.show()
No description has been provided for this image

Data Cleaning¶

Count missing values¶

  • Inspect missing values:
In [20]:
rawdata['Close'].isna().sum(axis=0)
Out[20]:
Symbol
AAPL    0
AXP     0
BA      0
BAC     0
CAT     0
CSCO    0
CVX     0
DD      0
DIS     0
GE      0
HD      0
HPQ     0
IBM     0
INTC    0
JNJ     0
JPM     0
KO      0
MCD     0
MMM     0
MRK     0
MSFT    0
PFE     0
PG      0
T       0
TRV     0
VZ      0
WMT     0
XOM     0
dtype: int64
In [21]:
anymissing = rawdata['Close'].isna().any(axis=1)
rawdata.loc[anymissing]
Out[21]:
Value Close ... Volume
Symbol AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date

0 rows × 140 columns

Data Cleaning¶

  • Double check: did we remove all missing values?
In [22]:
(rawdata.isna().sum()>0).any()
Out[22]:
False
  • Reset symbols to what is in data variable
In [23]:
symbols = symbols.loc[rawdata.columns.levels[1].to_list()]
symbols
Out[23]:
name
code
AAPL Apple Inc (AAPL)
AXP American Express Co. (AXP)
BA The Boeing Co. (BA)
BAC Bank of America Corp. (BAC)
CAT Caterpillar Inc. (CAT)
CSCO Cisco Systems Inc. (CSCO)
CVX Chevron Corporation (CVX)
DD E.I. du Pont de Nemours and Co. (DD)
DIS Disney (Walt) Co. (The) (DIS)
GE General Electric Co (GE)
HD Home Depot Inc (HD)
HPQ Hewlett-Packard Co (HPQ)
IBM International Business Machines Corp (IBM)
INTC Intel Corp (INTC)
JNJ Johnson & Johnson (JNJ)
JPM JP Morgan Chase (JPM)
KO Coca-Cola Co (The) (KO)
MCD McDonald's Corp (MCD)
MMM 3M Co (MMM)
MRK Merck & Co. Inc (MRK)
MSFT Microsoft Corporation (MSFT)
PFE Pfizer Inc (PFE)
PG Procter & Gamble Co. (PG)
T AT&T Inc (T)
TRV Travelers Companies Inc (The) (TRV)
VZ Verizon Communications Inc (VZ)
WMT Wal-Mart Stores (WMT)
XOM Exxon Mobil Corp. (XOM)
In [24]:
rawdata.head()
Out[24]:
Value Close ... Volume
Symbol AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date
2000-01-03 0.839280 32.323586 25.940268 12.490799 12.574042 35.148087 15.875281 11.853917 22.736227 129.774673 ... 2599386 6265782 53228400 12873345 4275000 7668476 336400 4663843 25109700 13458200
2000-01-04 0.768521 31.103098 25.899942 11.749414 12.412425 33.176201 15.875281 11.529919 24.068050 124.583733 ... 3245705 7894689 54119000 14208974 4270800 9497846 494400 5005878 20235300 14510800
2000-01-05 0.779767 30.313122 27.513647 11.878352 12.703340 33.074562 16.160040 11.904197 25.066923 124.367424 ... 4424482 7963018 64059600 12981591 5098400 12035160 736000 6368681 21056100 17485000
2000-01-06 0.712287 30.930702 27.796047 12.893729 13.349821 32.525688 16.848200 12.239370 24.068050 126.030174 ... 7147057 4989004 54976600 11115273 6524200 9471366 660400 4705763 19633500 19461600
2000-01-07 0.746027 31.381020 28.602884 12.555267 13.786201 34.436577 17.144821 12.513094 23.687531 130.910217 ... 4905035 10871218 62013600 17962163 9832000 7792534 594700 5043907 23930700 16603800

5 rows × 140 columns

Closing Prices¶

  • Closing prices accounts for splits, etc
In [25]:
closedata= rawdata['Close']
closedata.head()
Out[25]:
Symbol AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date
2000-01-03 0.839280 32.323586 25.940268 12.490799 12.574042 35.148087 15.875281 11.853917 22.736227 129.774673 ... 19.343803 25.427525 35.668091 11.650893 27.024214 6.324178 17.456139 15.576514 14.239633 17.255524
2000-01-04 0.768521 31.103098 25.899942 11.749414 12.412425 33.176201 15.875281 11.529919 24.068050 124.583733 ... 18.575180 24.534512 34.463211 11.216841 26.504219 5.954147 17.224714 15.072998 13.706803 16.925014
2000-01-05 0.779767 30.313122 27.513647 11.878352 12.703340 33.074562 16.160040 11.904197 25.066923 124.367424 ... 19.113220 25.498028 34.826580 11.399590 25.999968 6.046654 17.092470 15.576514 13.427073 17.847692
2000-01-06 0.712287 30.930702 27.796047 12.893729 13.349821 32.525688 16.848200 12.239370 24.068050 126.030174 ... 20.650473 25.709532 33.659988 11.810803 27.197546 5.929497 17.423077 15.497396 13.573600 18.770374
2000-01-07 0.746027 31.381020 28.602884 12.555267 13.786201 34.436577 17.144821 12.513094 23.687531 130.910217 ... 21.060410 28.177097 34.099834 12.610372 29.372086 5.980319 18.117361 15.382957 14.599282 18.715279

5 rows × 28 columns

Data Preprocessing¶

Log returns from stock prices¶

  • Data is price per share

  • We need daily returns from the prices

  • Given the prices $P_t$ and $P_{t-1}$ for time $t$, the return is, $$ R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1 $$

  • Approximation $\log(1+x)\approx x$ is good when $x$ is small

  • Since daily returns of stocks are small, calculate as returns, $$ r_t = \log(1 + R_t) = \log\left(\frac{P_t}{P_{t-1}}\right) = \log(P_t) - \log(P_{t-1})$$ So, in order to compute the log-returns, compute the difference of log prices:

In [26]:
logret = np.log(closedata).diff()
logret.head()
Out[26]:
Symbol AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Date
2000-01-03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2000-01-04 -0.088077 -0.038490 -0.001556 -0.061189 -0.012937 -0.057737 0.000000 -0.027713 0.056926 -0.040822 ... -0.040546 -0.035751 -0.034364 -0.037967 -0.019429 -0.060292 -0.013346 -0.032859 -0.038137 -0.019340
2000-01-05 0.014527 -0.025727 0.060441 0.010914 0.023167 -0.003068 0.017778 0.031946 0.040664 -0.001738 ... 0.028554 0.038520 0.010488 0.016161 -0.019209 0.015417 -0.007707 0.032859 -0.020619 0.053082
2000-01-06 -0.090514 0.020169 0.010212 0.082023 0.049638 -0.016734 0.041702 0.027767 -0.040664 0.013281 ... 0.077358 0.008261 -0.034071 0.035437 0.045031 -0.019566 0.019158 -0.005092 0.010854 0.050406
2000-01-07 0.046281 0.014454 0.028614 -0.026601 0.032165 0.057089 0.017452 0.022118 -0.015936 0.037990 ... 0.019657 0.091648 0.012983 0.065505 0.076918 0.008535 0.039075 -0.007412 0.072846 -0.002940

5 rows × 28 columns

  • First time period is NaN since there is no data corresponding to $-1$.

  • Note that $100\cdot r_t$% represent daily percentage returns.

Estimate expected returns¶

Estimate the daily expected returns by computing the means:

In [27]:
mu = logret[1:].mean()
mu
Out[27]:
Symbol
AAPL    0.000889
AXP     0.000371
BA      0.000315
BAC     0.000224
CAT     0.000589
CSCO    0.000113
CVX     0.000352
DD      0.000163
DIS     0.000246
GE      0.000134
HD      0.000354
HPQ     0.000129
IBM     0.000257
INTC    0.000074
JNJ     0.000320
JPM     0.000401
KO      0.000248
MCD     0.000407
MMM     0.000331
MRK     0.000188
MSFT    0.000412
PFE     0.000113
PG      0.000264
T       0.000210
TRV     0.000421
VZ      0.000144
WMT     0.000302
XOM     0.000290
dtype: float64

Estimate covariance matrix (volatility structure)¶

Estimate the covarince matrix of returns:

In [28]:
cov_sample = logret.cov()
cov_sample
Out[28]:
Symbol AAPL AXP BA BAC CAT CSCO CVX DD DIS GE ... MMM MRK MSFT PFE PG T TRV VZ WMT XOM
Symbol
AAPL 0.000632 0.000203 0.000163 0.000211 0.000179 0.000260 0.000116 0.000164 0.000167 0.000177 ... 0.000122 0.000084 0.000222 0.000085 0.000065 0.000103 0.000124 0.000089 0.000095 0.000109
AXP 0.000203 0.000498 0.000251 0.000408 0.000246 0.000226 0.000191 0.000259 0.000239 0.000275 ... 0.000178 0.000124 0.000195 0.000134 0.000095 0.000151 0.000220 0.000127 0.000113 0.000174
BA 0.000163 0.000251 0.000495 0.000249 0.000206 0.000168 0.000175 0.000226 0.000195 0.000232 ... 0.000149 0.000098 0.000157 0.000110 0.000078 0.000116 0.000160 0.000089 0.000089 0.000164
BAC 0.000211 0.000408 0.000249 0.000743 0.000276 0.000234 0.000210 0.000296 0.000235 0.000309 ... 0.000195 0.000130 0.000191 0.000148 0.000102 0.000164 0.000247 0.000134 0.000110 0.000188
CAT 0.000179 0.000246 0.000206 0.000276 0.000407 0.000182 0.000185 0.000259 0.000178 0.000221 ... 0.000177 0.000100 0.000156 0.000110 0.000077 0.000115 0.000157 0.000097 0.000092 0.000173
CSCO 0.000260 0.000226 0.000168 0.000234 0.000182 0.000516 0.000125 0.000183 0.000191 0.000199 ... 0.000139 0.000089 0.000237 0.000104 0.000074 0.000121 0.000145 0.000112 0.000106 0.000119
CVX 0.000116 0.000191 0.000175 0.000210 0.000185 0.000125 0.000300 0.000184 0.000144 0.000164 ... 0.000121 0.000103 0.000121 0.000102 0.000065 0.000106 0.000145 0.000089 0.000064 0.000240
DD 0.000164 0.000259 0.000226 0.000296 0.000259 0.000183 0.000184 0.000478 0.000193 0.000230 ... 0.000186 0.000117 0.000157 0.000125 0.000097 0.000126 0.000171 0.000104 0.000100 0.000171
DIS 0.000167 0.000239 0.000195 0.000235 0.000178 0.000191 0.000144 0.000193 0.000370 0.000194 ... 0.000136 0.000094 0.000162 0.000100 0.000073 0.000120 0.000141 0.000107 0.000092 0.000136
GE 0.000177 0.000275 0.000232 0.000309 0.000221 0.000199 0.000164 0.000230 0.000194 0.000436 ... 0.000168 0.000106 0.000159 0.000115 0.000083 0.000126 0.000172 0.000108 0.000097 0.000156
HD 0.000159 0.000214 0.000170 0.000221 0.000169 0.000170 0.000119 0.000182 0.000168 0.000173 ... 0.000136 0.000093 0.000152 0.000100 0.000080 0.000107 0.000145 0.000097 0.000140 0.000109
HPQ 0.000245 0.000228 0.000198 0.000233 0.000204 0.000268 0.000151 0.000208 0.000195 0.000201 ... 0.000143 0.000094 0.000204 0.000099 0.000060 0.000113 0.000141 0.000102 0.000084 0.000137
IBM 0.000159 0.000168 0.000136 0.000174 0.000145 0.000190 0.000111 0.000151 0.000136 0.000158 ... 0.000110 0.000077 0.000144 0.000083 0.000062 0.000104 0.000116 0.000089 0.000077 0.000109
INTC 0.000286 0.000240 0.000201 0.000241 0.000200 0.000316 0.000139 0.000204 0.000192 0.000206 ... 0.000148 0.000098 0.000260 0.000106 0.000078 0.000122 0.000151 0.000111 0.000103 0.000133
JNJ 0.000064 0.000095 0.000082 0.000098 0.000080 0.000073 0.000079 0.000088 0.000075 0.000082 ... 0.000080 0.000102 0.000071 0.000101 0.000074 0.000073 0.000079 0.000067 0.000060 0.000078
JPM 0.000205 0.000365 0.000231 0.000498 0.000247 0.000239 0.000188 0.000266 0.000225 0.000283 ... 0.000173 0.000126 0.000195 0.000134 0.000091 0.000159 0.000233 0.000138 0.000113 0.000171
KO 0.000072 0.000107 0.000096 0.000102 0.000086 0.000078 0.000082 0.000097 0.000084 0.000089 ... 0.000081 0.000075 0.000078 0.000074 0.000084 0.000080 0.000089 0.000074 0.000065 0.000083
MCD 0.000092 0.000123 0.000113 0.000122 0.000094 0.000099 0.000086 0.000107 0.000099 0.000102 ... 0.000080 0.000073 0.000085 0.000068 0.000069 0.000072 0.000103 0.000065 0.000070 0.000078
MMM 0.000122 0.000178 0.000149 0.000195 0.000177 0.000139 0.000121 0.000186 0.000136 0.000168 ... 0.000241 0.000085 0.000114 0.000099 0.000080 0.000093 0.000128 0.000080 0.000082 0.000118
MRK 0.000084 0.000124 0.000098 0.000130 0.000100 0.000089 0.000103 0.000117 0.000094 0.000106 ... 0.000085 0.000279 0.000088 0.000141 0.000081 0.000089 0.000105 0.000087 0.000070 0.000098
MSFT 0.000222 0.000195 0.000157 0.000191 0.000156 0.000237 0.000121 0.000157 0.000162 0.000159 ... 0.000114 0.000088 0.000358 0.000094 0.000068 0.000100 0.000128 0.000094 0.000094 0.000110
PFE 0.000085 0.000134 0.000110 0.000148 0.000110 0.000104 0.000102 0.000125 0.000100 0.000115 ... 0.000099 0.000141 0.000094 0.000252 0.000077 0.000091 0.000105 0.000085 0.000075 0.000101
PG 0.000065 0.000095 0.000078 0.000102 0.000077 0.000074 0.000065 0.000097 0.000073 0.000083 ... 0.000080 0.000081 0.000068 0.000077 0.000177 0.000078 0.000083 0.000071 0.000074 0.000066
T 0.000103 0.000151 0.000116 0.000164 0.000115 0.000121 0.000106 0.000126 0.000120 0.000126 ... 0.000093 0.000089 0.000100 0.000091 0.000078 0.000260 0.000112 0.000168 0.000079 0.000110
TRV 0.000124 0.000220 0.000160 0.000247 0.000157 0.000145 0.000145 0.000171 0.000141 0.000172 ... 0.000128 0.000105 0.000128 0.000105 0.000083 0.000112 0.000321 0.000100 0.000089 0.000132
VZ 0.000089 0.000127 0.000089 0.000134 0.000097 0.000112 0.000089 0.000104 0.000107 0.000108 ... 0.000080 0.000087 0.000094 0.000085 0.000071 0.000168 0.000100 0.000224 0.000072 0.000091
WMT 0.000095 0.000113 0.000089 0.000110 0.000092 0.000106 0.000064 0.000100 0.000092 0.000097 ... 0.000082 0.000070 0.000094 0.000075 0.000074 0.000079 0.000089 0.000072 0.000217 0.000067
XOM 0.000109 0.000174 0.000164 0.000188 0.000173 0.000119 0.000240 0.000171 0.000136 0.000156 ... 0.000118 0.000098 0.000110 0.000101 0.000066 0.000110 0.000132 0.000091 0.000067 0.000273

28 rows × 28 columns

Visualize volatility structure¶

  • Heatmap visualization of covariance and correlation
In [29]:
import matplotlib.pyplot as plt 
import seaborn as sns
fig, ax = plt.subplots(1, 2, figsize=(20, 8))
sns.heatmap(logret.cov(),  square=True, ax=ax[0]).set_title('Covariance Matrix')
sns.heatmap(logret.corr(), square=True, ax=ax[1]).set_title('Correlation Matrix');
# fig.show();
No description has been provided for this image
  • Reordering variables group similar stocks together
In [30]:
clmap = sns.clustermap(logret.corr(), square=True, figsize=(8, 8));
ordering = clmap.dendrogram_col.reordered_ind # save the hierarchical clustering generated variable ordering
/opt/conda/lib/python3.11/site-packages/seaborn/matrix.py:1124: UserWarning: ``square=True`` ignored in clustermap
  warnings.warn(msg)
No description has been provided for this image
In [31]:
fig, ax = plt.subplots(1, 2, figsize=(20, 8))
sns.heatmap(logret.cov().iloc[ordering, ordering],  square=True, ax=ax[0]).set_title('Covariance Matrix')
sns.heatmap(logret.corr().iloc[ordering, ordering], square=True, ax=ax[1]).set_title('Correlation Matrix');
# fig.show()
No description has been provided for this image

Calculate Portfolio Allocation¶

Minimum Variance Portfolio¶

  • $\Sigma$ matrix is all that is specified
  • $\Sigma$ characterizes the market
  • Minimize variance of portfolio $R_p$
In [32]:
s, _ = cov_sample.shape # calculated from data

w = cvx.Variable(s) # variables
risk = cvx.quad_form(w, cov_sample.values)  # objective function
prob = cvx.Problem(cvx.Minimize(risk), # optimization problem 
               [cvx.sum(w) == 1])      # fully invested portfolio constraint
prob.solve()
Out[32]:
8.182720882145511e-05
In [33]:
w.value
Out[33]:
array([ 0.01652722, -0.06677811, -0.00556869, -0.01420911,  0.00303467,
       -0.01916917,  0.03920442, -0.03985089,  0.01652121, -0.00538882,
       -0.02358231,  0.01211534,  0.09040282, -0.01447738,  0.2142491 ,
       -0.02886968,  0.1425609 ,  0.14683368,  0.06653689,  0.01455767,
        0.02968867,  0.02761711,  0.12084072,  0.00462494,  0.01039099,
        0.09993229,  0.13561062,  0.0266449 ])
  • Positive weights indicate long positions

  • Negative weights indicate short positions)

  • Must sum to 1 (constraint)

  • Other portfolio optimization variations:
    http://nbviewer.jupyter.org/github/cvxgrp/cvx_short_course/blob/master/applications/portfolio_optimization.ipynb

  • cvxpy package makes it easy to solve many types of problems easily

MVP with target return¶

  • Portfolio allocation problem with target expected return constraint:

$$ \min_{x\in\mathbb{R}^s}\ \ x^\intercal \Sigma x\\ \text{subject to }\mu^\intercal x\geq \mu^* \text{, and } \mathbf{1}^\intercal x = 1,$$

  • Expected returns, $\mu$, is estimated from data (also $\Sigma$)
  • Investor specifies target return $\mu^*$
In [45]:
s,_ = cov_sample.shape

w = cvx.Variable(s)
risk = cvx.quad_form(w, cov_sample.values)
prob = cvx.Problem(cvx.Minimize(risk), 
               [
                   cvx.sum(w) == 1,
                   mu.values@w >= 0.001,
                   # w >= 0
               ]) 
prob.solve()
Out[45]:
0.00025582983343653144
In [46]:
w.value
Out[46]:
array([ 0.37727003,  0.0103586 ,  0.00531868, -0.19475594,  0.38319326,
       -0.1993208 ,  0.15270708, -0.22931477, -0.04446237, -0.22544009,
       -0.0039972 , -0.1268713 ,  0.07198615, -0.20353497,  0.43703468,
        0.20627437,  0.00312622,  0.32205393,  0.14227969, -0.07175241,
        0.18423752, -0.21650405,  0.10500898,  0.0260248 ,  0.16675933,
       -0.11354043,  0.12224684, -0.08638584])
  • Higher target return achieved with more short positions
  • Constraint: $\mu^\intercal w=0.001$
  • Constraint: $\mathbf{1}^\intercal w=1$.
In [47]:
w.value.sum()
Out[47]:
0.9999999999999997
In [48]:
np.dot(mu.values,w.value)
Out[48]:
0.0010000000000000002

Backtesting: portfolio returns¶

  • Investing 1 dollar in a stock with 3% return over one time period makes me 3 cents:
    $$\$1 \cdot (1 + 0.03) = 1.03$$

  • Investing a portfolio of worth $A_0$ dollars returns, $$ A = A_0 (1+r_t^T w_t), $$ where elements of $r_t$ are returns at time $t$ and $w$ is portfolio allocation

  • Take allocation w.value and compute portfolio returns using historical returns: logret.

In [ ]:
earned = np.dot(logret.bfill().values, w.value)
  • Compute portfolio growth
In [ ]:
earnedcp = (1+earned[1:]).cumprod()
ecp = pd.DataFrame(data=earnedcp.T, index=logret.index.values[1:], columns=['Portfolio Value'])
ecp.plot(figsize=(10, 8));
No description has been provided for this image
In [ ]:
np.log10(ecp).plot(figsize=(10, 8));
No description has been provided for this image
In [ ]:
ecp
Out[ ]:
Portfolio Value
2000-01-04 0.981337
2000-01-05 0.984165
2000-01-06 0.988135
2000-01-07 1.028528
2000-01-10 0.974107
... ...
2025-10-27 283.112617
2025-10-28 276.528236
2025-10-29 286.172518
2025-10-30 290.575876
2025-10-31 285.749925

6497 rows × 1 columns

  • 1 dollar invested in 2000 would be 285 dollars today?
  • Unrealistic! Why?
  • Having knowledge of 20 years worth of data is cheating!

Summarize results¶

In [ ]:
retinfo = logret.mean(axis=0).agg(['min', 'mean', 'max'])
stdinfo = logret.std(axis=0).agg(['min', 'mean', 'max'])

print('portfolio average returns:', earned.mean())
print('          average stddev :', earned.std())
print('')
print("component stocks average: minimum: %f\n                          average: %f\n                          maximum: %f" % tuple(retinfo))
print("component stocks stddev : minimum: %f\n                          average: %f\n                          maximum: %f" % tuple(stdinfo))
portfolio average returns: 0.0009969739339166317
          average stddev : 0.015994079712789433

component stocks average: minimum: 0.000074
                          average: 0.000295
                          maximum: 0.000889
component stocks stddev : minimum: 0.012085
                          average: 0.018796
                          maximum: 0.027256

Annual return would be:

In [ ]:
(1+earned.mean())**365 - 1
Out[ ]:
0.43866299874773973

Again, unrealistic

Visualization of Relationships¶

  • Correlations are stricly pairwise quantities (what about other stocks?)
  • Inverse covariance
In [ ]:
from sklearn import preprocessing
scaler = preprocessing.StandardScaler()
scaled_df = scaler.fit_transform(logret.dropna().values)
scaled_df = pd.DataFrame(scaled_df, columns=logret.columns, index=logret.index[1:])

Correlation Matrix: Year 2020¶

In [ ]:
clmap = sns.clustermap(scaled_df.loc['2020':'2020'].corr(), square=True, figsize=(8, 8));
ordering = clmap.dendrogram_col.reordered_ind # save the hierarchical clustering generated variable ordering
/opt/conda/lib/python3.11/site-packages/seaborn/matrix.py:1124: UserWarning: ``square=True`` ignored in clustermap
  warnings.warn(msg)
No description has been provided for this image

Correlation Matrix: 2019 vs. 2020¶

In [ ]:
fig, ax = plt.subplots(1, 2, figsize=(20, 8))

sns.heatmap(scaled_df.loc['2019':'2019'].corr().iloc[ordering, ordering],  square=True, ax=ax[0]).set_title('Correlation Matrix: 2019')
sns.heatmap(scaled_df.loc['2020':'2020'].corr().iloc[ordering, ordering],  square=True, ax=ax[1]).set_title('Correlation Matrix: 2020');
# fig.show()
No description has been provided for this image

Reality check¶

Obviously, this cannot be realistic. What aspects were unrealistic?

  1. We didn't take time into consideration: e.g. estimation of $\Sigma$ and selecting w.
  2. We are investing with knowledge of the future returns!
  3. We did not take into account transaction costs (we will have to)
  4. We did not take into account shorting requires borrowing of money
  5. Is investing in stocks better than leaving our money in a savings account? What is the interest rate?

Sharpe ratio tries to quantify the added benefit, i.e., excess returns, of investing in the volatile market by accounting for the volatility: $$\text{Sharpe ratio}=\frac{E\left[R_{p}-R_{f}\right]}{\sigma_{p}}$$ where $R_f$ is the risk-free rate.

Backtesting¶

In [ ]:
# Rolling backtest with 300-day lookback and 50-day rebalance horizon

def estimate_moments(returns_df, cov_fn=None):
  """
  Estimate mean vector and covariance matrix from a returns window.
  - returns_df: DataFrame of daily returns (rows: dates, cols: assets)
  - cov_fn: optional function taking returns_df and returning a covariance matrix
  """
  window = returns_df.dropna()
  mu_hat = window.mean()
  if cov_fn is None:
    cov_hat = window.cov()
  else:
    cov_hat = cov_fn(window)
    # Convert to DataFrame with consistent indexing if needed
    if not isinstance(cov_hat, pd.DataFrame):
      cov_hat = pd.DataFrame(cov_hat, index=window.columns, columns=window.columns)
  return mu_hat, cov_hat


def compute_min_var_weights(mu_vec, cov_mat, target_return=None, long_only=False, ridge=1e-6, solver_opts=None):
  """
  Solve for portfolio weights minimizing variance:
    min w' Σ w
    s.t. 1' w = 1
       μ' w >= target_return (optional)
       w >= 0 (if long_only)
  - mu_vec: pd.Series indexed by asset
  - cov_mat: pd.DataFrame with same index/columns
  - ridge: small diagonal loading for numerical stability
  """
  cols = mu_vec.index
  Sigma = cov_mat.copy()
  # numerical stabilizer
  Sigma.values[range(Sigma.shape[0]), range(Sigma.shape[0])] += ridge

  n = len(cols)
  w_var = cvx.Variable(n)
  constraints = [cvx.sum(w_var) == 1]
  if long_only:
    constraints.append(w_var >= 0)
  if target_return is not None:
    constraints.append(mu_vec.values @ w_var >= float(target_return))

  objective = cvx.Minimize(cvx.quad_form(w_var, Sigma.values))
  prob_local = cvx.Problem(objective, constraints)
  prob_local.solve(**(solver_opts or {}))
  w_val = w_var.value
  if w_val is None or np.any(np.isnan(w_val)):
    raise RuntimeError("Solver returned invalid weights")
  weights = pd.Series(w_val, index=cols, name='weight')
  return weights


def compute_portfolio_returns(returns_df, weights):
  """
  Compute daily portfolio returns given returns matrix and weights.
  - returns_df: DataFrame (dates x assets)
  - weights: pd.Series indexed by assets
  """
  # align columns
  common = [c for c in returns_df.columns if c in weights.index]
  if len(common) == 0:
    return pd.Series([], dtype=float)
  w = weights.loc[common]
  R = returns_df[common]
  # dot product per day
  return R.dot(w)


def rolling_backtest(returns,
           lookback=300,
           horizon=50,
           rebalance=50,
           cov_fn=None,
           target_return=None,
           long_only=True,
           ridge=1e-6,
           solver_opts=None):
  """
  Rolling backtest:
  - Use past `lookback` days to estimate mu, Sigma
  - Invest for next `horizon` days with fixed weights
  - Rebalance every `rebalance` days
  Returns:
    dict with:
    'returns': pd.Series of daily portfolio returns over backtest
    'weights': pd.DataFrame of weights per rebalance date (rows=dates, cols=assets)
  """
  dates = returns.index
  n = len(dates)
  out_returns_parts = []
  weights_records = []

  start = lookback
  while start < n:
    end_invest = min(start + horizon, n)

    window = returns.iloc[start - lookback:start]
    mu_hat, cov_hat = estimate_moments(window, cov_fn=cov_fn)
    w_hat = compute_min_var_weights(mu_hat, cov_hat,
                    target_return=target_return,
                    long_only=long_only,
                    ridge=ridge,
                    solver_opts=solver_opts)

    period_rets = returns.iloc[start:end_invest]
    port_rets = compute_portfolio_returns(period_rets, w_hat)
    out_returns_parts.append(port_rets)

    # store weights at the rebalance date
    w_row = w_hat.reindex(returns.columns).fillna(0.0)
    w_row.name = dates[start]
    weights_records.append(w_row)

    # advance to next rebalance
    start += rebalance

  out_returns = pd.concat(out_returns_parts).sort_index()
  weights_df = pd.DataFrame(weights_records)
  return {'returns': out_returns, 'weights': weights_df}
In [ ]:
# Run the requested backtest on the existing `logret` DataFrame
bt = rolling_backtest(
  logret,
  lookback=300,
  horizon=50,
  rebalance=50,
  cov_fn=None,          # default: sample covariance
  target_return=None,   # pure minimum-variance
  long_only=False,       # set False to allow shorting
  ridge=1e-6,
)

bt_returns = bt['returns']
bt_weights = bt['weights']

# Example outputs: cumulative growth from log returns
bt_wealth = np.exp(bt_returns.cumsum())
bt_results = {
  'final_wealth': float(bt_wealth.iloc[-1]) if len(bt_wealth) else np.nan,
  'mean_daily_return': float(bt_returns.mean()) if len(bt_returns) else np.nan,
  'std_daily_return': float(bt_returns.std()) if len(bt_returns) else np.nan,
}

bt_results
Out[ ]:
{'final_wealth': 5.631200904168396,
 'mean_daily_return': 0.0002788516818128177,
 'std_daily_return': 0.009221033517696169}
In [ ]:
# perform backtest using ledoit-wolf shrinkage covariance estimator
from sklearn.covariance import LedoitWolf
def ledoit_wolf_cov(returns_df):
  lw = LedoitWolf().fit(returns_df.dropna().values)
  cov_est = lw.covariance_
  return pd.DataFrame(cov_est, index=returns_df.columns, columns=returns_df.columns)
bt_lw = rolling_backtest(
  logret,
  lookback=300,
  horizon=50,
  rebalance=50,
  cov_fn=ledoit_wolf_cov,
  target_return=None,
  long_only=False,
  ridge=1e-6,
)
bt_lw_returns = bt_lw['returns']
bt_lw_wealth = np.exp(bt_lw_returns.cumsum())
bt_lw_results = {
  'final_wealth': float(bt_lw_wealth.iloc[-1]) if len(bt_lw_wealth) else np.nan,
  'mean_daily_return': float(bt_lw_returns.mean()) if len(bt_lw_returns) else np.nan,
  'std_daily_return': float(bt_lw_returns.std()) if len(bt_lw_returns) else np.nan,
}

bt_lw_results
Out[ ]:
{'final_wealth': 5.843559184966783,
 'mean_daily_return': 0.000284824146608879,
 'std_daily_return': 0.009095051514796566}
In [ ]:
# perform backtest using graphical lasso covariance estimator
from sklearn.covariance import GraphicalLasso
def graphical_lasso_cov(returns_df, alpha=0.01):
  gl = GraphicalLasso(alpha=alpha).fit(returns_df.dropna().values)
  cov_est = gl.covariance_
  return pd.DataFrame(cov_est, index=returns_df.columns, columns=returns_df.columns)

bt_gl = rolling_backtest(
  logret,
  lookback=300,
  horizon=50,
  rebalance=50,
  cov_fn=graphical_lasso_cov,
  target_return=None,
  long_only=False,
  ridge=1e-6,
)
bt_gl_returns = bt_gl['returns']
bt_gl_wealth = np.exp(bt_gl_returns.cumsum())
bt_gl_results = {
  'final_wealth': float(bt_gl_wealth.iloc[-1]) if len(bt_gl_wealth) else np.nan,
  'mean_daily_return': float(bt_gl_returns.mean()) if len(bt_gl_returns) else np.nan,
  'std_daily_return': float(bt_gl_returns.std()) if len(bt_gl_returns) else np.nan,
}

bt_gl_results
Out[ ]:
{'final_wealth': 7.13234951776576,
 'mean_daily_return': 0.0003169797847334469,
 'std_daily_return': 0.010463294745776675}
In [ ]:
# Plot the growth of the three portfolios
plt.figure(figsize=(12, 6))
plt.plot(bt_wealth.index, bt_wealth.values, label='Sample Covariance', color='blue')
plt.plot(bt_lw_wealth.index, bt_lw_wealth.values, label='Ledoit-Wolf', color='orange')
plt.plot(bt_gl_wealth.index, bt_gl_wealth.values, label='Graphical Lasso', color='green')
plt.title('Portfolio Growth Over Time')
plt.xlabel('Date')
plt.ylabel('Cumulative Growth')
plt.legend()
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
store['logret'] = logret
store.close()