Hierarchical Clustering Portfolio Optimization

https://img.shields.io/static/v1?label=Sponsor&message=%E2%9D%A4&logo=GitHub&color=%23fe8e86
Buy Me a Coffee at ko-fi.com

Some Theory

Hierarchical Clustering Portfolio Optimization

Riskfolio-Lib allows to calculate the new machine learning asset allocation models. The available models are:

  • Hierarchical Risk Parity (HRP) [C1], [C2], [C3].

  • Hierarchical Equal Risk Contribution (HERC) [C4], [C5], [C2].

  • Nested Clustered Optimization (NCO) [C6], [C2].

In the first two cases we have the option to use the following 32 risk measures to calculate HRP and HERC portfolios using naive risk parity:

Dispersion Risk Measures:

  • Standard Deviation.

  • Variance.

  • Square Root Kurtosis.

  • Mean Absolute Deviation (MAD).

  • Gini Mean Difference (GMD).

  • Conditional Value at Risk Range.

  • Tail Gini Range.

  • Range.

Downside Risk Measures:

  • Semi Standard Deviation.

  • Square Root Semi Kurtosis.

  • First Lower Partial Moment (Omega Ratio).

  • Second Lower Partial Moment (Sortino Ratio).

  • Value at Risk (VaR).

  • Conditional Value at Risk (CVaR).

  • Entropic Value at Risk (EVaR).

  • Relativistic Value at Risk (RLVaR).

  • Tail Gini.

  • Worst Case Realization (Minimax).

Drawdown Risk Measures:

  • Average Drawdown for compounded and uncompounded cumulative returns.

  • Ulcer Index for compounded and uncompounded cumulative returns.

  • Drawdown at Risk (DaR) for compounded and uncompounded cumulative returns.

  • Conditional Drawdown at Risk (CDaR) for compounded and uncompounded cumulative returns.

  • Entropic Drawdown at Risk (EDaR) for compounded and uncompounded cumulative returns.

  • Relativistic Drawdown at Risk (EDaR) for compounded and uncompounded cumulative returns.

  • Maximum Drawdown (Calmar Ratio) for compounded and uncompounded cumulative returns.

For the NCO model we have the option to use four objective functions with the available risk measures to each objective:

  • Minimize the selected risk measure.

  • Maximize the Utility function \(\mu w - l \phi_{i}(w)\).

  • Maximize the risk adjusted return ratio based on the selected risk measure.

  • Equally risk contribution portfolio of the selected risk measure.

Module Methods

class HCPortfolio.HCPortfolio(returns=None, alpha=0.05, a_sim=100, beta=None, b_sim=None, kappa=0.3, solver_rl='CLARABEL', solvers=['CLARABEL', 'SCS', 'ECOS'], w_max=None, w_min=None, alpha_tail=0.05, gs_threshold=0.5, bins_info='KN')[source]

Class that creates a portfolio object with all properties needed to calculate optimal portfolios.

Parameters:
  • returns (DataFrame, optional) – A dataframe that containts the returns of the assets. The default is None.

  • alpha (float, optional) – Significance level of VaR, CVaR, EVaR, RLVaR, DaR, CDaR, EDaR, RLDaR and Tail Gini of losses. The default is 0.05.

  • a_sim (float, optional) – Number of CVaRs used to approximate Tail Gini of losses. The default is 100.

  • beta (float, optional) – Significance level of CVaR and Tail Gini of gains. If None it duplicates alpha value. The default is None.

  • b_sim (float, optional) – Number of CVaRs used to approximate Tail Gini of gains. If None it duplicates a_sim value. The default is None.

  • kappa (float, optional) – Deformation parameter of RLVaR and RLDaR, must be between 0 and 1. The default is 0.30.

  • solver_rl (str, optional) – Solver available for CVXPY that supports power cone programming. Used to calculate RLVaR and RLDaR. The default value is None.

  • solvers (list, optional) – List of solvers available for CVXPY used for the selected NCO method. The default value is [‘CLARABEL’, ‘SCS’, ‘ECOS’].

  • w_max (pd.Series or float, optional) – Upper bound constraint for hierarchical risk parity weights [C3].

  • w_min (pd.Series or float, optional) – Lower bound constraint for hierarchical risk parity weights [C3].

  • alpha_tail (float, optional) – Significance level for lower tail dependence index. The default is 0.05.

  • gs_threshold (float, optional) – Gerber statistic threshold. The default is 0.5.

  • bins_info (int or str) –

    Number of bins used to calculate variation of information. The default value is ‘KN’. Possible values are:

    • ’KN’: Knuth’s choice method. See more in knuth_bin_width.

    • ’FD’: Freedman–Diaconis’ choice method. See more in freedman_bin_width.

    • ’SC’: Scotts’ choice method. See more in scott_bin_width.

    • ’HGR’: Hacine-Gharbi and Ravier’ choice method.

    • int: integer value choice by user.

optimization(model='HRP', codependence='pearson', covariance='hist', obj='MinRisk', rm='MV', rf=0, l=2, custom_cov=None, custom_mu=None, linkage='single', k=None, max_k=10, bins_info='KN', alpha_tail=0.05, gs_threshold=0.5, leaf_order=True, d=0.94, **kwargs)[source]

This method calculates the optimal portfolio according to the optimization model selected by the user.

Parameters:
  • model (str, can be {'HRP', 'HERC' or 'HERC2'}) –

    The hierarchical cluster portfolio model used for optimize the portfolio. The default is ‘HRP’. Possible values are:

    • ’HRP’: Hierarchical Risk Parity.

    • ’HERC’: Hierarchical Equal Risk Contribution.

    • ’HERC2’: HERC but splitting weights equally within clusters.

    • ’NCO’: Nested Clustered Optimization.

  • codependence (str, optional) –

    The codependence or similarity matrix used to build the distance metric and clusters. The default is ‘pearson’. Possible values are:

    • ’pearson’: pearson correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{pearson}_{i,j})}\).

    • ’spearman’: spearman correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{spearman}_{i,j})}\).

    • ’kendall’: kendall correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{kendall}_{i,j})}\).

    • ’gerber1’: Gerber statistic 1 correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{gerber1}_{i,j})}\).

    • ’gerber2’: Gerber statistic 2 correlation matrix. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{gerber2}_{i,j})}\).

    • ’abs_pearson’: absolute value pearson correlation matrix. Distance formula: \(D_{i,j} = \sqrt{(1-|\rho^{pearson}_{i,j}|)}\).

    • ’abs_spearman’: absolute value spearman correlation matrix. Distance formula: \(D_{i,j} = \sqrt{(1-|\rho^{spearman}_{i,j}|)}\).

    • ’abs_kendall’: absolute value kendall correlation matrix. Distance formula: \(D_{i,j} = \sqrt{(1-|\rho^{kendall}_{i,j}|)}\).

    • ’distance’: distance correlation matrix. Distance formula \(D_{i,j} = \sqrt{(1-\rho^{distance}_{i,j})}\).

    • ’mutual_info’: mutual information matrix. Distance used is variation information matrix.

    • ’tail’: lower tail dependence index matrix. Dissimilarity formula \(D_{i,j} = -\log{\lambda_{i,j}}\).

    • ’custom_cov’: use custom correlation matrix based on the custom_cov parameter. Distance formula: \(D_{i,j} = \sqrt{0.5(1-\rho^{pearson}_{i,j})}\).

  • covariance (str, optional) –

    The method used to estimate the covariance matrix: The default is ‘hist’. Possible values are:

    • ’hist’: use historical estimates.

    • ’ewma1’: use ewma with adjust=True. For more information see EWM.

    • ’ewma2’: use ewma with adjust=False. For more information see EWM.

    • ’ledoit’: use the Ledoit and Wolf Shrinkage method.

    • ’oas’: use the Oracle Approximation Shrinkage method.

    • ’shrunk’: use the basic Shrunk Covariance method.

    • ’gl’: use the basic Graphical Lasso Covariance method.

    • ’jlogo’: use the j-LoGo Covariance method. For more information see: [C7].

    • ’fixed’: denoise using fixed method. For more information see chapter 2 of [C8].

    • ’spectral’: denoise using spectral method. For more information see chapter 2 of [C8].

    • ’shrink’: denoise using shrink method. For more information see chapter 2 of [C8].

    • ’gerber1’: use the Gerber statistic 1. For more information see: [C9].

    • ’gerber2’: use the Gerber statistic 2. For more information see: [C9].

    • ’custom_cov’: use custom covariance matrix.

  • obj (str can be {'MinRisk', 'Utility', 'Sharpe' or 'ERC'}.) –

    Objective function used by the NCO model. The default is ‘MinRisk’. Possible values are:

    • ’MinRisk’: Minimize the selected risk measure.

    • ’Utility’: Maximize the Utility function \(\mu w - l \phi_{i}(w)\).

    • ’Sharpe’: Maximize the risk adjusted return ratio based on the selected risk measure.

    • ’ERC’: Equally risk contribution portfolio of the selected risk measure.

  • rm (str, optional) –

    The risk measure used to optimize the portfolio. If model is ‘NCO’, the risk measures available depends on the objective function. The default is ‘MV’. Possible values are:

    • ’equal’: Equally weighted.

    • ’vol’: Standard Deviation.

    • ’MV’: Variance.

    • ’KT’: Square Root Kurtosis.

    • ’MAD’: Mean Absolute Deviation.

    • ’MSV’: Semi Standard Deviation.

    • ’SKT’: Square Root Semi Kurtosis.

    • ’FLPM’: First Lower Partial Moment (Omega Ratio).

    • ’SLPM’: Second Lower Partial Moment (Sortino Ratio).

    • ’VaR’: Value at Risk.

    • ’CVaR’: Conditional Value at Risk.

    • ’TG’: Tail Gini.

    • ’EVaR’: Entropic Value at Risk.

    • ’RLVaR’: Relativistic Value at Risk.

    • ’WR’: Worst Realization (Minimax).

    • ’RG’: Range of returns.

    • ’CVRG’: CVaR range of returns.

    • ’TGRG’: Tail Gini range of returns.

    • ’MDD’: Maximum Drawdown of uncompounded cumulative returns (Calmar Ratio).

    • ’ADD’: Average Drawdown of uncompounded cumulative returns.

    • ’DaR’: Drawdown at Risk of uncompounded cumulative returns.

    • ’CDaR’: Conditional Drawdown at Risk of uncompounded cumulative returns.

    • ’EDaR’: Entropic Drawdown at Risk of uncompounded cumulative returns.

    • ’RLDaR’: Relativistic Drawdown at Risk of uncompounded cumulative returns.

    • ’UCI’: Ulcer Index of uncompounded cumulative returns.

    • ’MDD_Rel’: Maximum Drawdown of compounded cumulative returns (Calmar Ratio).

    • ’ADD_Rel’: Average Drawdown of compounded cumulative returns.

    • ’DaR_Rel’: Drawdown at Risk of compounded cumulative returns.

    • ’CDaR_Rel’: Conditional Drawdown at Risk of compounded cumulative returns.

    • ’EDaR_Rel’: Entropic Drawdown at Risk of compounded cumulative returns.

    • ’RLDaR_Rel’: Relativistic Drawdown at Risk of compounded cumulative returns.

    • ’UCI_Rel’: Ulcer Index of compounded cumulative returns.

  • rf (float, optional) – Risk free rate, must be in the same period of assets returns. The default is 0.

  • l (scalar, optional) – Risk aversion factor of the ‘Utility’ objective function. The default is 2.

  • custom_cov (DataFrame or None, optional) – Custom covariance matrix, used when codependence or covariance parameters have value ‘custom_cov’. The default is None.

  • custom_mu (DataFrame or None, optional) – Custom mean vector when NCO objective is ‘Utility’ or ‘Sharpe’. The default is None.

  • linkage (string, optional) –

    Linkage method of hierarchical clustering. For more information see linkage. The default is ‘single’. Possible values are:

    • ’single’.

    • ’complete’.

    • ’average’.

    • ’weighted’.

    • ’centroid’.

    • ’median’.

    • ’ward’.

    • ’DBHT’: Direct Bubble Hierarchical Tree.

  • k (int, optional) – Number of clusters. This value is took instead of the optimal number of clusters calculated with the two difference gap statistic. The default is None.

  • max_k (int, optional) – Max number of clusters used by the two difference gap statistic to find the optimal number of clusters. The default is 10.

  • bins_info (int or str) –

    Number of bins used to calculate variation of information. The default value is ‘KN’. Possible values are:

    • ’KN’: Knuth’s choice method. See more in knuth_bin_width.

    • ’FD’: Freedman–Diaconis’ choice method. See more in freedman_bin_width.

    • ’SC’: Scotts’ choice method. See more in scott_bin_width.

    • ’HGR’: Hacine-Gharbi and Ravier’ choice method.

    • int: integer value choice by user.

  • alpha_tail (float, optional) – Significance level for lower tail dependence index. The default is 0.05.

  • gs_threshold (float, optional) – Gerber statistic threshold. The default is 0.5.

  • leaf_order (bool, optional) – Indicates if the cluster are ordered so that the distance between successive leaves is minimal. The default is True.

  • d (scalar) – The smoothing factor of ewma methods. The default is 0.94.

  • **kwargs – Other variables related to covariance estimation. See Scikit Learn and chapter 2 of [D1] for more details.

Returns:

w – The weights of optimal portfolio.

Return type:

DataFrame

Bibliography

[C1]

Marcos López de Prado. Building diversified portfolios that outperform out of sample. The Journal of Portfolio Management, 42(4):59–69, 2016. URL: https://jpm.pm-research.com/content/42/4/59, arXiv:https://jpm.pm-research.com/content/42/4/59.full.pdf, doi:10.3905/jpm.2016.42.4.059.

[C2] (1,2,3)

Daniel Sjöstrand and Nima Behnejad. Exploration of hierarchical clustering in long-only risk-based portfolio optimization. Master's thesis, Copenhagen Business School, Solbjerg Pl. 3, 2000 Frederiksberg, Denmark, 5 2020. URL: https://research-api.cbs.dk/ws/portalfiles/portal/62178444/879726_Master_Thesis_Nima_Daniel_15736.pdf.

[C3] (1,2,3)

Johann Pfitzinger and Nico Katzke. A constrained hierarchical risk parity algorithm with cluster-based capital allocation. Working Papers 14/2019, Stellenbosch University, Department of Economics, 2019. URL: https://ideas.repec.org/p/sza/wpaper/wpapers328.html, doi:.

[C4]

Thomas Raffinot. Hierarchical clustering-based asset allocation. The Journal of Portfolio Management, 44(2):89–99, December 2017. URL: https://doi.org/10.3905/jpm.2018.44.2.089, doi:10.3905/jpm.2018.44.2.089.

[C5]

Thomas Raffinot. The hierarchical equal risk contribution portfolio. 08 2018. doi:10.2139/ssrn.3237540.

[C6]

Marcos Prado. A robust estimator of the efficient frontier. SSRN Electronic Journal, pages, 01 2019. doi:10.2139/ssrn.3469961.

[C7]

Wolfram Barfuss, Guido Previde Massara, T. Di Matteo, and Tomaso Aste. Parsimonious modeling with information filtering networks. Physical Review E, Dec 2016. URL: http://dx.doi.org/10.1103/PhysRevE.94.062306, doi:10.1103/physreve.94.062306.

[C8] (1,2,3)

Marcos M. López de Prado. Machine Learning for Asset Managers. Elements in Quantitative Finance. Cambridge University Press, 2020. doi:10.1017/9781108883658.

[C9] (1,2)

Sander Gerber, Harry Markowitz, Philip Ernst, Yinsen Miao, Babak Javid, and Paul Sargen. The gerber statistic: a robust co-movement measure for portfolio optimization. SSRN Electronic Journal, 2021. URL: https://doi.org/10.2139/ssrn.3880054, doi:10.2139/ssrn.3880054.