首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Stacking集成学习模型的气态亚硝酸预测
引用本文:唐科,秦敏,赵星,段俊,方武,梁帅西,孟凡昊,叶凯迪,张鹤露,谢品华.基于Stacking集成学习模型的气态亚硝酸预测[J].中国环境科学,2020,40(2):582-590.
作者姓名:唐科  秦敏  赵星  段俊  方武  梁帅西  孟凡昊  叶凯迪  张鹤露  谢品华
作者单位:1. 中国科学院安徽光学精密机械研究所, 环境光学与技术重点实验室, 安徽 合肥 230031; 2. 中国科学技术大学, 安徽 合肥 230026; 3. 中国科学院区域大气环境研究卓越创新中心, 福建 厦门 361021
基金项目:国家自然科学基金资助项目(41875154,91544104,4170050319);中国科学院重点部署项目(KFZD-SW-320);国家重点研发计划项目(2017YFC0209403);中国科学院安徽光学精密机械研究所所长基金资助项目(AGHH201601)
摘    要:建立了基于Stacking集成学习下气态亚硝酸(HONO)预测模型.利用非相干宽带腔增强吸收光谱(IBBCEAS)系统获得的北京城区HONO的浓度,结合HONO的来源,选取了O3、CO、SO2、NO、NO2、NOy、温度(T)、相对湿度(RH)、风速(WS)、j(HONO)、j(NO2)、j(O1D)作为特征数据,通过对HONO的平均日变化分析,将测量时间按小时转换为新特征.分别以极端梯度提升(XGBoost)、轻量化梯度促进机(LightGBM)以及随机森林(RF)算法构建基模型,采用5折交叉验证的方式划分训练集,将基模型输出的结果作为新特征集,并将新特征集作为第二层线性回归模型的输入,通过对这两层中的模型进行训练,最终得到Stacking集成学习HONO预测模型.通过对模型的特征重要度分析和计算夜间交通直接排放所占的贡献,表明CO是模型预测中重要的影响因子,说明机动车的直接排放是该区域冬季时期HONO的重要来源.利用测试集分别对单模型和融合后模型的预测性能进行评估,3个单模型的预测结果与测量值的相关系数都达到了0.91以上,其中Stacking融合后的模型性能最好,相关系数达到了0.94,平均绝对误差和均方根误差分别为0.307×10-9和0.453×10-9,结果表明基于Stacking集成学习方式下HONO预测模型的可解释性和推广性.

关 键 词:Stacking  K折交叉验证  集成  气态亚硝酸  预测  
收稿时间:2019-07-24

Prediction of gaseous nitrous acid based on Stacking ensemble learning model
TANG Ke,QIN Min,ZHAO Xing,DUAN Jun,FANG Wu,LIANG Shuai-xi,MENG Fan-hao,YE Kai-di,ZHANG He-lu,XIE Pin-hua.Prediction of gaseous nitrous acid based on Stacking ensemble learning model[J].China Environmental Science,2020,40(2):582-590.
Authors:TANG Ke  QIN Min  ZHAO Xing  DUAN Jun  FANG Wu  LIANG Shuai-xi  MENG Fan-hao  YE Kai-di  ZHANG He-lu  XIE Pin-hua
Institution:1. Key Laboratory of Environment Optics and Technology, Anhui Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Hefei 230031, China; 2. University of Science and Technology of China, Hefei 230026, China; 3. Center for Excellence in Regional Atmospheric Environment, Chinese Academy of Sciences, Xiamen 361021, China
Abstract:A gaseous nitrous acid (HONO) prediction model based on Stacking ensemble learning was proposed. The concentrations of HONO in Beijing urban area were obtained using incoherent broadband cavity enhanced absorption spectroscopy (IBBCEAS). Combined with the HONO sources, O3, CO, SO2, NO, NO2, NOy, temperature (T), relative humidity (RH), wind speed (WS), j(HONO), j(NO2), j(O1D) were selected as characteristic data. By analyzing the average diurnal variation of HONO, the measurement time was converted into a new feature hour by hour. The base model was constructed by utilizing Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM) and Random Forest (RF) algorithm. The training set was partitioned by 5-fold cross-validation method. The output of the base model was taken as a new feature set and as the input of second-level linear regression model. HONO prediction model was finally obtained via training the models in these two layers. Through the feature importance analysis and calculating the contribution of direct emission of vehicles at night, it showed that CO was an important impact factor in the prediction model, and that the direct emission of vehicles was a major source of HONO in the winter period at the region. The prediction performance of the base model and the Stacking ensemble model were evaluated by the test set respectively. The correlation coefficients between forecast results and measured values for the three base models were above 0.91. The performance of the Stacking ensemble model was the best, whose correlation coefficients reached 0.94. The average absolute error and root mean square error were 0.307×10-9 and 0.453×10-9, respectively. Explanability and applicability of the HONO prediction model based on Stacking ensemble learning.
Keywords:stacking  K-fold cross validation  ensemble  gaseous HONO  prediction  
本文献已被 CNKI 等数据库收录!
点击此处可从《中国环境科学》浏览原始摘要信息
点击此处可从《中国环境科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号