本研究构建了基于极端梯度提升(XGBoost)和已实现波动率异质自回归(HAR-RV)的混合模型,采用沪深300指数的五分钟价格数据,选取已实现波动率的历史值、市场交易指标和技术指标作为特征进行预测,根据XGBoost重要性评分,使用递归特征消除法进行特征选择。实验结果表明,我们所提出的混合模型预测效果优于目前主流应用的单一模型,XGBoost递归特征消除起到了优化特征子集的作用。本研究旨在为金融市场的波动率预测提供新的视角,并为投资者和风险管理者提供一种有效的工具。This study develops a hybrid model based on Extreme Gradient Boosting (XGBoost) and Heterogeneous Autoregression of Realized Volatility (HAR-RV), employing five-minute price data from the CSI 300 Index. We select historical values of realized volatility, trading indicators, and technical indicators as features for prediction. Feature selection is conducted using Recursive Feature Elimination based on XGBoost importance scores. The experimental results indicate that the hybrid model we propose has superior predictive performance compared to the currently mainstream single models, and the XGBoost recursive feature elimination effectively optimizes the subset of features. This research aims to provide a fresh perspective on financial market volatility prediction and to offer investors and risk managers a potent tool.
选取上证综指5分钟高频数据,以高频价格序列的强记忆性为切入点,构建基于高频价格序列的长短期记忆模型LSTM。基于已实现波动率(RV)理论计算出真实波动率的预测值,选择了效果优异的随机森林模型、弹性网络模型以及直接对波动率建模的LSTM模型进行对比分析,以找出表现较优的预测模型,以期为深度学习在波动率的预测上提供了新思路。研究发现:基于高频价格序列的LSTM波动率预测模型的预测能力明显优于其他两种模型,充分发挥了长短期记忆模型的优势。Selecting the 5-minute high-frequency data of the Shanghai Composite Index and taking the strong memory of the high-frequency price sequence as the entry point, a Long Short-Term Memory (LSTM) model based on the high-frequency price sequence was constructed. Based on the realized volatility (RV) theory, the predicted values of the real volatility were calculated. The random forest model with excellent results, the elastic network model, and the LSTM model directly modeling the volatility were selected for comparative analysis to identify the better-performing prediction model, with the aim of providing new ideas for deep learning in volatility prediction. It was found that the prediction ability of the LSTM volatility prediction model based on the high-frequency price sequence was significantly better than the other two models, giving full play to the advantages of the long short-term memory model.