Back

Quant research / Published report

ARIMA-EGARCH + LightGBM — S&P 500 volatility

Econometrics + ML study testing the contribution of a conditional log-sigma feature inside a multi-asset LightGBM model.

ARIMAEGARCHLightGBMVolatility

Problem

Tabular ML models capture many technical signals but may miss conditional risk structure from volatility dynamics.

Approach

Reproducible 2013-2024 pipeline: ARIMA-EGARCH, causal feature generation, LightGBM, ablations, walk-forward evaluation, Diebold-Mariano, SHAP and permutation importance.

Evidence

Complete dataset: RMSE 0.0109, R² 0.765 versus 0.0113 and 0.749 without the insight.

AR + insights: R² 0.538 versus 0.497 without the insight.

Diebold-Mariano tests are significant for the key comparisons, p < 0.01.

ARIMA-EGARCH LightGBM report cover

Full report

Forecasting S&P 500 volatility with an ARIMA-EGARCH and LightGBM pipeline.

Original French version and full English version prepared for international / YC reading.

Method

2013-2024 pipeline with temporal split and walk-forward evaluation.

Causally generated log_sigma_garch signal from an ARIMA-EGARCH model.

LightGBM ablations with and without the insight, Diebold-Mariano tests, bootstrap R², SHAP and permutation importance.

Key results

Complete dataset: RMSE 0.0109, R² 0.765.

No-insight dataset: RMSE 0.0113, R² 0.749.

AR + insights: R² 0.538 vs 0.497 without insight.

This study focuses on volatility forecasting. It is not an investment strategy or financial recommendation.

Limitations

This is a volatility forecasting study, not an investment strategy. H2 is rejected and the horizon remains D+1.

More work

Back