Quant research / Published report
ARIMA-EGARCH + LightGBM — S&P 500 volatility
Econometrics + ML study testing the contribution of a conditional log-sigma feature inside a multi-asset LightGBM model.
Problem
Tabular ML models capture many technical signals but may miss conditional risk structure from volatility dynamics.
Approach
Reproducible 2013-2024 pipeline: ARIMA-EGARCH, causal feature generation, LightGBM, ablations, walk-forward evaluation, Diebold-Mariano, SHAP and permutation importance.
Evidence
Complete dataset: RMSE 0.0109, R² 0.765 versus 0.0113 and 0.749 without the insight.
AR + insights: R² 0.538 versus 0.497 without the insight.
Diebold-Mariano tests are significant for the key comparisons, p < 0.01.

Full report
Forecasting S&P 500 volatility with an ARIMA-EGARCH and LightGBM pipeline.
Original French version and full English version prepared for international / YC reading.
Method
2013-2024 pipeline with temporal split and walk-forward evaluation.
Causally generated log_sigma_garch signal from an ARIMA-EGARCH model.
LightGBM ablations with and without the insight, Diebold-Mariano tests, bootstrap R², SHAP and permutation importance.
Key results
Complete dataset: RMSE 0.0109, R² 0.765.
No-insight dataset: RMSE 0.0113, R² 0.749.
AR + insights: R² 0.538 vs 0.497 without insight.
This study focuses on volatility forecasting. It is not an investment strategy or financial recommendation.
Limitations
This is a volatility forecasting study, not an investment strategy. H2 is rejected and the horizon remains D+1.