共查询到17条相似文献,搜索用时 93 毫秒
1.
2.
依据经济合作与发展组织(OECD)关于定量结构-活性关系(QSAR)模型构建和使用导则,将780个有机化合物,以4:1的比例随机划分为训练集(624个化合物)和验证集(156个化合物),通过多元线性回归(MLR)方法构建了一个包含12个描述符的有机化合物鱼类生物富集因子(BCF)的QSAR模型。QSAR模型的调整决定系数R2ad j=0.809,去一法交叉验证系数Q2LOO=0.803,外部验证系数Q2EXT=0.732,表明模型具有较好的拟合优度、稳健性和预测能力。采用欧几里德距离方法表征模型应用域,通过威廉姆斯图分析模型离群点,并对模型进行机理解释。所构建的模型,可以用于预测应用域内有机化学品的生物富集因子。 相似文献
3.
4.
二氧化碳(CO2)是导致全球变暖最主要的温室气体,掌握准确的CO2空间分布信息可以有效评估碳减排成效,对于推进碳达峰、碳中和工作具有重要意义。相比站点观测,碳卫星能够获取大尺度的CO2分布信息,但是由于其幅宽较窄以及云覆盖的影响,大气CO2卫星遥感数据存在大量缺值区域,不能获得空间连续的大气CO2分布。以新疆维吾尔自治区为研究区,基于2019年OCO-2卫星大气二氧化碳柱平均干空气混合比(XCO2)数据,结合气温、地形、植被、大气NO2浓度等相关变量,综合对比了多元线性回归(MLR)、地理加权回归(GWR)、支持向量机(SVR)、随机森林(RF)、极端梯度提升树(XGBoost)和极端随机树(ERT)等方法在生成大气XCO2空间连续数据中的表现。交叉验证结果表明,RF、XGBoost和ERT这3种集成学习模型精度明显优于SVR、GWR和MLR模型,其中ERT模型精度最高,决定系数R2为0.7... 相似文献
5.
6.
构建了不同类型的分子顶点之间的电性关系作为结构描述符,对部分含氯芳烃化合物结构进行了参数化表征,共得到7个与化合物结构密切相关的结构描述符。逐步回归(SMR)筛选变量后,分别运用多元线性回归(MLR)和偏最小二乘回归(PLS)建立了化合物结构与孔雀鱼半数致死浓度(-log LC50)之间的关系模型,两模型建模相关系数(r2)分别为0.871、0.862;"留一法"交互检验的相关系数(Q2)分别为0.808、0.589。结果表明分子结构描述符能恰当地表征化合物结构特征,所建模型具有良好的稳定性和预测能力。 相似文献
7.
多氯代二苯并呋喃(PCDFs)是一种典型的持久性有机污染物(POPs),光解是其在环境中转化的主要途径.以分子电性距离矢量(Molecular Electronegativity Distance Vector,MEDV)为参数,应用多元线性回归(Multiple Linear Regression,MLR)和偏最小二乘回归(PLSR)对48种PCDFs在云杉针叶和飞灰表面的光解半衰期(t1/2)进行模拟分析,均获得由2个变量所建的定量结构-性质相关(QSPR)模型.多元线性回归结果:建模相关系数(R)分别为0.860和0.836,标准偏差(SD)分别为0.052和0.053,交互检验复相关系数(Rcv)分别为0.839和0.807,外部检验相关系数(Qex)t分别为0.939和0.853;偏最小二乘回归结果:建模相关系数(R)分别为0.857和0.829,交互检验复相关系数(Rcv)分别为0.849和0.807.结果表明,MEDV能较好地表征该类分子的结构信息,所建QSPR模型具有良好的稳定性和预测能力. 相似文献
8.
将不同非氢原子自身及非氢原子之间的关系参数化并构建出新的结构描述符,对部分酚类化合物分子结构进行了参数化表达。采用逐步回归(SMR)与多元线性回归(MLR)相结合的方法建立了化合物结构与醇/水分配系数(log Kow)之间的关系模型,模型的建模相关系数(r)为0.988,标准偏差(SD)为0.121;"留一法"交互检验的相关系数(Q2)为0.966,标准偏差(SDCV)为0.148。结果表明结构描述符能较好地表征化合物分子结构特征,所建模型稳定性好,预测能力强,对于酚类化合物QSPR研究具有一定的参考价值。 相似文献
9.
基于基团贡献法的有机化合物好氧生物降解预测模型研究 总被引:1,自引:0,他引:1
从MITI-Ⅰ试验中筛选出587种不同类型有机化合物的可用数据,通过对这些物质的结构进行拆分,随机选择其中50种化合物作为验证集,另外537种作为训练集,利用多元线性回归(MLR)和支持向量机(SVM)2种计算方法分别建立模型。结果表明,芳香酸、醛、芳香碘和叔胺等功能基团对有机化合物的好氧生物降解性影响较大;MLR模型总体预测正确率为81.43%,验证集正确率为82%,SVM模型总体预测正确率为87.90%,验证集正确率为86%。所建立的2种定量结构与生物降解性关系(QSBR)模型有效,可用于化学品的好氧生物降解性评价。 相似文献
10.
生物半减期(t1/2)是评价外源化合物在鱼体内蓄积效应的重要参数。实验测定t_(1/2)的速度慢、成本高,难以满足化学品生态风险评价的需求,需要发展替代实验的模型预测方法。本研究搜集了653种化合物t1/2实测值,采用多元线性回归(MLR)和支持向量机(SVM) 2种方法,建立了鱼体logt1/2的定量构效关系(QSAR)预测模型。MLR模型的校正决定系数(R(adj)~2)为0.751,均方根误差(RMSE_(train))为0.587,去一法交叉验证系数(Q_(LOO)~2)为0.735,外部验证系数(Q_(ext)~2)为0.682,这表明模型具有较好的拟合度、稳健性和预测能力。SVM模型具有更好的拟合和预测能力(R_(adj)~2=0.839,RMSE_(train)=0.457,Q_(ext)~2=0.708)。采用Williams法对模型的应用域进行表征。所建模型可用于预测多环芳烃、多氯联苯、多溴联苯醚、有机磷农药、药物等典型化合物,以及其他烷烃、环烷烃、烯烃、醇、醚、酸、酯、酮、含卤素化合物、芳香族化合物、含硫、氮、磷化合物的在鱼体内的logt1/2值。 相似文献
11.
12.
A comparison of two models with Landsat data for estimating above ground grassland biomass in Inner Mongolia,China 总被引:2,自引:0,他引:2
Two models, artificial neural network (ANN) and multiple linear regression (MLR), were developed to estimate typical grassland aboveground dry biomass in Xilingol River Basin, Inner Mongolia, China. The normalized difference vegetation index (NDVI) and topographic variables (elevation, aspect, and slope) were combined with atmospherically corrected reflectance from the Landsat ETM+ reflective bands as the candidate input variables for building both models. Seven variables (NDVI, aspect, and bands 1, 3, 4, 5 and 7) were selected by the ANN model (implemented in Statistica 6.0 neural network module), while six (elevation, NDVI, and bands 1, 3, 5 and 7) were picked to fit the MLR function after a stepwise analysis was executed between the candidate input variables and the above ground dry biomass. Both models achieved reasonable results with RMSEs ranging from 39.88% to 50.08%. The ANN model provided a more accurate estimation (RMSEr = 39.88% for the training set, and RMSEr = 42.36% for the testing set) than MLR (RMSEr = 49.51% for the training, and RMSEr = 53.20% for the testing). The final above ground dry biomass maps of the research area were produced based on the ANN and MLR models, generating the estimated mean values of 121 and 147 g/m2, respectively. 相似文献
13.
14.
《Ecological modelling》2005,181(4):581-589
Chlorophyll-a is a well-accepted index for phytoplankton abundance and population of primary producers in an aquatic environment. The relationships between Chlorophyll-a and 16 chemical, physical and biological water quality variables in Çamlıdere reservoir (Ankara, Turkey) were studied by using principal component scores (PCS) in multiple linear regression analysis (MLR) to predict Chlorophyll-a levels. Principal component analysis was used to simplify the complexity of relations between water quality variables. Score values obtained by PC scores were used as independent variables in the multiple linear regression models. Two approaches were used in the present statistical analysis. In the first approach, only five selected score values obtained by PC analysis were used for the prediction of Chlorophyll-a levels and predictive success (R2) of the model found as 56.3%. In the second approach, where all score values obtained from the PC analysis were used as independent variables, predictive power was turned out to be 90.8%. Both approaches could be used to predict Chlorophyll-a levels in reservoirs successfully. 相似文献
15.
16.
Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests 总被引:3,自引:0,他引:3
Forestry science has a long tradition of studying the relationship between stand productivity and abiotic and biotic site characteristics, such as climate, topography, soil and vegetation. Many of the early site quality modelling studies related site index to environmental variables using basic statistical methods such as linear regression. Because most ecological variables show a typical non-linear course and a non-constant variance distribution, a large fraction of the variation remained unexplained by these linear models. More recently, the development of more advanced non-parametric and machine learning methods provided opportunities to overcome these limitations. Nevertheless, these methods also have drawbacks. Due to their increasing complexity they are not only more difficult to implement and interpret, but also more vulnerable to overfitting. Especially in a context of regionalisation, this may prove to be problematic. Although many non-parametric and machine learning methods are increasingly used in applications related to forest site quality assessment, their predictive performance has only been assessed for a limited number of methods and ecosystems.In this study, five different modelling techniques are compared and evaluated, i.e. multiple linear regression (MLR), classification and regression trees (CART), boosted regression trees (BRT), generalized additive models (GAM), and artificial neural networks (ANN). Each method is used to model site index of homogeneous stands of three important tree species of the Taurus Mountains (Turkey): Pinus brutia, Pinus nigra and Cedrus libani. Site index is related to soil, vegetation and topographical variables, which are available for 167 sample plots covering all important environmental gradients in the research area. The five techniques are compared in a multi-criteria decision analysis in which different model performance measures, ecological interpretability and user-friendliness are considered as criteria.When combining these criteria, in most cases GAM is found to outperform all other techniques for modelling site index for the three species. BRT is a good alternative in case the ecological interpretability of the technique is of higher importance. When user-friendliness is more important MLR and CART are the preferred alternatives. Despite its good predictive performance, ANN is penalized for its complex, non-transparent models and big training effort. 相似文献
17.
运用Chem Office软件绘制37个多氯代苯并噻吩三维图,并得到对应的分子空间坐标Pi(xi,yi,zi)。以多氯代苯并噻吩分子的原子距离指数、分子空间特征指数、分子电性距离矢量、氯原子数为分子描述变量,采用多元线性回归和BP人工神经网络建立描述变量与多氯代苯并噻吩的气相色谱保留时间的QSPR模型。结果表明:多元线性回归建模相关系数R=0.9970,SD=2.1830,基于BP人工神经网络建立的模型R=0.9996,SD=0.3123。为多氯代苯并噻吩分子结构与物性的QSPR研究提供了新思路。 相似文献