首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Empirical QSAR models are only valid in the domain they were trained and validated. Application of the model to substances outside the domain of the model can lead to grossly erroneous predictions. Partial least squares (PLS) regression provides tools for prediction diagnostics that can be used to decide whether or not a substance is within the model domain, i.e. if the model prediction can be trusted. QSAR models for four different environmental end-points are used to demonstrate the importance of appropriate training set selection and how the reliability of QSAR predictions can be increased by outlier diagnostics. All models showed consistent results; test set prediction errors were very similar in magnitude to training set estimation errors when prediction outlier diagnostics were used to detect and remove outliers in the prediction data. Test set prediction errors for substances classified as outliers were much larger. The difference in the number of outliers between models with a randomly and systematically selected training illustrates well the need of representative training data.  相似文献   

2.
R.E. Rathbun  D.Y. Tai 《Chemosphere》1984,13(7):715-730
A nonlinear least squares procedure and a log transformation procedure for calculating first-order rate coefficients from experimental concentration-versus-time data were compared using laboratory measurements of the volatilization from water of 1,1,1-trichloroethane and 1,2-dichloroethane and the absorption of oxygen by water. Ratios of the nonlinear least squares to log transformation volatilization and absorption coefficients for 77 tests ranged from 0.955 to 1.08 and averaged 1.01. Comparison of the maximum, minimum, and mean root-mean-square errors of prediction for six sets of coefficients showed that the errors for the nonlinear least squares procedure were almost always smaller than the errors for the log transformation procedure.  相似文献   

3.
Exposure models are needed for comparison of scenarios resulting from alternative policy options. The reliability of models used for such purposes should be quantified by comparing model outputs in a real situation with the corresponding observed exposures. Measurement errors affect the observations, but if the distribution of these errors for single observations is known, the bias caused for the population statistics can be corrected. The current paper does this and calculates model errors for a probabilistic simulation of 48-hr fine particulate matter (PM2.5) exposures. Direct and nested microenvironment-based models are compared. The direct model requires knowledge on the distribution of the indoor concentrations, whereas the nested model calculates indoor concentrations from ambient levels, using infiltration factors and indoor sources. The model error in the mean exposure level was <0.5 microg m(-3) for both models. Relative errors in the estimated population mean were +1% and -5% for the direct and nested models, respectively. Relative errors in the estimated SD were -9% and -23%, respectively. The magnitude of these errors and the errors calculated for population percentiles indicate that the model errors would not drive general conclusions derived from these models, supporting the use of the models as a tool for evaluation of potential exposure reductions in alternative policy scenarios.  相似文献   

4.
The many advances made in air quality model evaluation procedures during the past ten years are discussed and some components of model uncertainty presented. Simplified statistical procedures for operational model evaluation are suggested. The fundamental model performance measures are the mean bias, the mean square error, and the correlation. The bootstrap resampling technique is used to estimate confidence limits on the performance measures, In order to determine if a model agrees satisfactorily with data or if one model is significantly different from another model. Applications to two tracer experiments are described.

It is emphasized that review and evaluation of the scientific components of models are often of greater Importance than the strictly statistical evaluation. A necessary condition for acceptance Of a model should be that it is scientifically correct. It Is shown that even in research-grade tracer experiments, data Input errors can cause errors In hourly-average model predictions of point concentrations almost as large as the predictions themselves. The turbulent or stochastic component of model uncertainty has a similar magnitude. These components of the uncertainty decrease as averaging time increases.  相似文献   

5.
6.
Dilution water demand (DWD) can cause a positive error when the dilution biochemical oxygen demand (BOD) method is used. Dilution water demand may be attributed to oxidation of organic impurities in the dilution water and nitrification of ammonia added as a nutrient. To minimize the error associated with these sources, the standard BOD method requires that DWD be less than 0.2 mg/L in 5 days and does not allow correction for DWD when calculating test results. This study derives a set of theoretical equations to analyze the uncorrected errors with and without seeding. The authors concluded that DWD can be completely corrected if seeded dilution water is used for the sample dilution. When seeding individual bottles, the uncorrected error approaches 8.3 to approximately 8.8% at a 5-day depletion of 2 mg/L for a typical secondary effluent. Tests without seeding show an almost 1% higher uncorrected error than seeded tests. The analysis also suggests that these errors can be effectively reduced to less than 3% when the 5-day depletion approaches 6 mg/L. even for 5-day biochemical oxygen demand concentrations exceeding I x 10(4) mg/L. Further analysis indicates that, if not inhibited, the ammonium added to dilution water as a nutrient may contribute additional error due to nitrification.  相似文献   

7.
Author’s Reply     
A technique is developed to compute precision requirements for component parts of an emissions inventory to ensure (at a given confidence level) an overall acceptable precision in the estimate of total emissions. Since the emissions inventory is a basic requirement of air quality control implementation plans and provides a valuable management tool for planning air pollution control activities, it isi appropriate to state in quantitative terms the confidence that can be associated with each inventory. The approach reported here uses weighted sensitivity analysis methods to distribute both percentage and physical errors in source class emissions according to their contribution to the total emissions, and utilizes Chebyshev’s inequality to establish confidence levels for total emissions. The analysis has been extended to cover the case where one or more of the error components in a given inventory source class can be fixed by the analyst. The utility of the technique is manifold and several practical applications are reported. In particular, it serves to establish percentage error requirements for source categories to satisfy given error bounds for the overall emissions inventory at a given level of statistical confidence. The weighted sensitivity analysis technique possesses a high degree of generality, being applicable to compute component error requirements for any kind of data inventory which exhibits a hierarchical (tree-like) structure, as exemplified by NEDS Emissions Summary Reports. This work should be of interest to air pollution control planners at all levels of government and to anyone responsible for the air pollution portion of environmental impact statements.  相似文献   

8.
Different methods for the field-scale estimation of contaminant mass discharge in groundwater at control planes based on multi-level well data are numerically analysed for the expected estimation error. We consider "direct" methods based on time-integrated measuring of mass flux, as well as "indirect" methods, where estimates are derived from concentration measurements. The appropriateness of the methods is evaluated by means of modelled data provided by simulation of mass transport in a three-dimensional model domain. Uncertain heterogeneous aquifer conditions are addressed by means of Monte-Carlo simulations with aquifer conductivity as a random space function. We investigate extensively the role of the interplay between the spatial resolution of the sampling grid and aquifer heterogeneity with respect to the accuracy of the mass discharge estimation. It is shown that estimation errors can be reduced only if spatial sampling intervals are in due proportion to spatial correlation length scales. The ranking of the methods with regard to estimation error is shown to be heavily dependent on both the given sampling resolution and prevailing aquifer heterogeneity. Regarding the "indirect" estimation methods, we demonstrate the great importance of a consistent averaging of the parameters used for the discharge estimation.  相似文献   

9.
Contamination source identification is a crucial step in environmental remediation. The exact contaminant source locations and release histories are often unknown due to lack of records and therefore must be identified through inversion. Coupled source location and release history identification is a complex nonlinear optimization problem. Existing strategies for contaminant source identification have important practical limitations. In many studies, analytical solutions for point sources are used; the problem is often formulated and solved via nonlinear optimization; and model uncertainty is seldom considered. In practice, model uncertainty can be significant because of the uncertainty in model structure and parameters, and the error in numerical solutions. An inaccurate model can lead to erroneous inversion of contaminant sources. In this work, a constrained robust least squares (CRLS) estimator is combined with a branch-and-bound global optimization solver for iteratively identifying source release histories and source locations. CRLS is used for source release history recovery and the global optimization solver is used for location search. CRLS is a robust estimator that was developed to incorporate directly a modeler's prior knowledge of model uncertainty and measurement error. The robustness of CRLS is essential for systems that are ill-conditioned. Because of this decoupling, the total solution time can be reduced significantly. Our numerical experiments show that the combination of CRLS with the global optimization solver achieved better performance than the combination of a non-robust estimator, i.e., the nonnegative least squares (NNLS) method, with the same solver.  相似文献   

10.
Contamination source identification is a crucial step in environmental remediation. The exact contaminant source locations and release histories are often unknown due to lack of records and therefore must be identified through inversion. Coupled source location and release history identification is a complex nonlinear optimization problem. Existing strategies for contaminant source identification have important practical limitations. In many studies, analytical solutions for point sources are used; the problem is often formulated and solved via nonlinear optimization; and model uncertainty is seldom considered. In practice, model uncertainty can be significant because of the uncertainty in model structure and parameters, and the error in numerical solutions. An inaccurate model can lead to erroneous inversion of contaminant sources. In this work, a constrained robust least squares (CRLS) estimator is combined with a branch-and-bound global optimization solver for iteratively identifying source release histories and source locations. CRLS is used for source release history recovery and the global optimization solver is used for location search. CRLS is a robust estimator that was developed to incorporate directly a modeler's prior knowledge of model uncertainty and measurement error. The robustness of CRLS is essential for systems that are ill-conditioned. Because of this decoupling, the total solution time can be reduced significantly. Our numerical experiments show that the combination of CRLS with the global optimization solver achieved better performance than the combination of a non-robust estimator, i.e., the nonnegative least squares (NNLS) method, with the same solver.  相似文献   

11.
It is well known that skin sea surface temperature (SSST) is different from bulk sea surface temperature (BSST) by a few tenths of a degree Celsius. However, the extent of the error associated with dry deposition (or uptake) estimation by using BSST is not well known. This study tries to conduct such an evaluation using the on-board observation data over the South China Sea in the summers of 2004 and 2006. It was found that when a warm layer occurred, the deposition velocities using BSST were underestimated within the range of 0.8–4.3%, and the absorbed sea surface heat flux was overestimated by 21 W m?2. In contrast, under cool skin only conditions, the deposition velocities using BSST were overestimated within the range of 0.5–2.0%, varying with pollutants and the absorbed sea surface heat flux was underestimated also by 21 W m?2. Scale analysis shows that for a slightly soluble gas (e.g., NO2, NO and CO), the error in the solubility estimation using BSST is the major source of the error in dry deposition estimation. For a highly soluble gas (e.g., SO2), the error in the estimation of turbulent heat fluxes and, consequently, aerodynamic resistance and gas-phase film resistance using BSST is the major source of the total error. In contrast, for a medium soluble gas (e.g., O3 and CO2) both the errors from the estimations of the solubility and aerodynamic resistance are important. In addition, deposition estimations using various assumptions are discussed. The largest uncertainty is from the parameterizations for chemical enhancement factors. Other important areas of uncertainty include: (1) various parameterizations for gas-transfer velocity; (2) neutral-atmosphere assumption; (3) using BSST as SST, and (4) constant pH value assumption.  相似文献   

12.
In air pollution epidemiology, error in measurements of correlated pollutants has been advanced as a reason to distrust regressions that find statistically significant weak associations. Much of the related debate in the literature and elsewhere has been qualitative. To promote quantitative evaluation of such errors, this paper develops an air pollution time-series model based on correlations among unit-normal variables. Assuming there are no other sources of bias present, the model shows the expected amount of relative bias in the regression coefficients of a bivariate regression of coarse and fine particulate matter measurements on daily mortality. The model only requires information on instrumental error and spatial variability, along with the observed regression coefficients and information on the true fine-course correlation. Analytical results show that if one pollutant is truly more harmful than the other, then it must be measured more precisely than the other in order not to bias the ratio of the fine and course regression coefficients. Utilizing published data, a case study of the Harvard Six-Cities study illustrates use of the model and emphasizes the need for data on spatial variability across the study area. Current epidemiology time-series regressions can use this model to address the general concern of correlated pollutants with differing measurement errors.  相似文献   

13.
Long-path averaging instruments measure the average velocity or concentration of a substance or substances over an averaging path. These measurements are then often used for calculation of the average concentration and mass flow rate of the substance. The purpose of this paper is to describe some of the limitations of these instruments and to suggest ways in which these limitations can be minimized. Two limitations were examined: measuring concentration in a single dimension (e.g., ignoring the variation in concentration over the width of the sample plane), and deriving an average concentration without considering velocity effects. The resultant errors will be application-specific. Estimates of the second source of error can be obtained from the covariance of concentration and velocity profiles over the path length. Unfortunately, suitable field data were not available, and to illustrate the method, estimates of the error were obtained for a range of possible concentration and velocity profiles. Errors of 50% or greater in the mass flow were incurred for the concentration and velocity profiles considered. This error was reduced to a negligible level by segmenting the averaging path length. It is recommended that velocity and concentration profiles be obtained for a broad range of applications to enable the importance of covariance errors to be better assessed.  相似文献   

14.
Data from the U.S. Geological Survey (USGS) collocated-sampler program for the National Atmospheric Deposition Program/National Trends Network (NADP/NTN) are used to estimate the overall error of NADP/NTN measurements. Absolute errors are estimated by comparison of paired measurements from collocated instruments. Spatial and temporal differences in absolute error were identified and are consistent with longitudinal distributions of NADP/NTN measurements and spatial differences in precipitation characteristics. The magnitude of error for calcium, magnesium, ammonium, nitrate, and sulfate concentrations, specific conductance, and sample volume is of minor environmental significance to data users. Data collected after a 1994 sample-handling protocol change are prone to less absolute error than data collected prior to 1994. Absolute errors are smaller during non-winter months than during winter months for selected constituents at sites where frozen precipitation is common. Minimum resolvable differences are estimated for different regions of the USA to aid spatial and temporal watershed analyses.  相似文献   

15.
ABSTRACT

Long-path averaging instruments measure the average velocity or concentration of a substance or substances over an averaging path. These measurements are then often used for calculation of the average concentration and mass flow rate of the substance. The purpose of this paper is to describe some of the limitations of these instruments and to suggest ways in which these limitations can be minimized. Two limitations were examined: measuring concentration in a single dimension (e.g., ignoring the variation in concentration over the width of the sample plane), and deriving an average concentration without considering velocity effects. The resultant errors will be application-specific.

Estimates of the second source of error can be obtained from the covariance of concentration and velocity profiles over the path length. Unfortunately, suitable field data were not available, and to illustrate the method, estimates of the error were obtained for a range of possible concentration and velocity profiles. Errors of 50% or greater in the mass flow were incurred for the concentration and velocity profiles considered. This error was reduced to a negligible level by segmenting the averaging path length. It is recommended that velocity and concentration profiles be obtained for a broad range of applications to enable the importance of covariance errors to be better assessed.  相似文献   

16.
To obtain reliable diffusion parameters for diffusion testing, multiple experiments should not only be cross-checked but the internal consistency of each experiment should also be verified. In the through- and in-diffusion tests with solution reservoirs, test interpretation of different phases often makes use of simplified analytical solutions. This study explores the feasibility of steady, quasi-steady, equilibrium and transient-state analyses using simplified analytical solutions with respect to (i) valid conditions for each analytical solution, (ii) potential error, and (iii) experimental time. For increased generality, a series of numerical analyses are performed using unified dimensionless parameters and the results are all related to dimensionless reservoir volume (DRV) which includes only the sorptive parameter as an unknown. This means the above factors can be investigated on the basis of the sorption properties of the testing material and/or tracer. The main findings are that steady, quasi-steady and equilibrium-state analyses are applicable when the tracer is not highly sorptive. However, quasi-steady and equilibrium-state analyses become inefficient or impractical compared to steady state analysis when the tracer is non-sorbing and material porosity is significantly low. Systematic and comprehensive reformulation of analytical models enables the comparison of experimental times between different test methods. The applicability and potential error of each test interpretation can also be studied. These can be applied in designing, performing, and interpreting diffusion experiments by deducing DRV from the available information for the target material and tracer, combined with the results of this study.  相似文献   

17.
A novel differential pulse voltammetry method (DPV) was researched and developed for the simultaneous determination of Pendimethalin, Dinoseb and sodium 5-nitroguaiacolate (5NG) with the aid of chemometrics. The voltammograms of these three compounds overlapped significantly, and to facilitate the simultaneous determination of the three analytes, chemometrics methods were applied. These included classical least squares (CLS), principal component regression (PCR), partial least squares (PLS) and radial basis function-artificial neural networks (RBF-ANN). A separately prepared verification data set was used to confirm the calibrations, which were built from the original and first derivative data matrices of the voltammograms. On the basis relative prediction errors and recoveries of the analytes, the RBF-ANN and the DPLS (D – first derivative spectra) models performed best and are particularly recommended for application. The DPLS calibration model was applied satisfactorily for the prediction of the three analytes from market vegetables and lake water samples.  相似文献   

18.
A novel differential pulse voltammetry method (DPV) was researched and developed for the simultaneous determination of Pendimethalin, Dinoseb and sodium 5-nitroguaiacolate (5NG) with the aid of chemometrics. The voltammograms of these three compounds overlapped significantly, and to facilitate the simultaneous determination of the three analytes, chemometrics methods were applied. These included classical least squares (CLS), principal component regression (PCR), partial least squares (PLS) and radial basis function-artificial neural networks (RBF-ANN). A separately prepared verification data set was used to confirm the calibrations, which were built from the original and first derivative data matrices of the voltammograms. On the basis relative prediction errors and recoveries of the analytes, the RBF-ANN and the DPLS (D - first derivative spectra) models performed best and are particularly recommended for application. The DPLS calibration model was applied satisfactorily for the prediction of the three analytes from market vegetables and lake water samples.  相似文献   

19.
ABSTRACT

European legislation continues to drive down emission limit values, making the emission measurement of narrow stacks of increasing importance. However, the applicable standards (EN ISO 16911–1 and EN 15259) are poorly validated for narrow stacks, and the effect of flow disturbances on the described methods are largely unknown. In this article, measurement errors are investigated in narrow stacks with flow disturbances and swirl, both experimentally and through computational fluid dynamics (CFD) simulations. The results indicate that measurement errors due to misalignment of the flow with typical measuring probes (pitot tubes) are small compared to errors resulting from the positioning of these probes in the measurement plane. Errors up to 15% are reported using the standardized methods, while the measurement error is both smaller and more predictable when using additional measurement points.

Implications: Current international standards provide methods to measure emissions from industrial stacks. With increasingly small emission limit values, the accuracy of these measurements is becoming considerably more important. The data from this study can be used to inform revisions of these standards, in particular with respect to flow disturbances in narrow stacks, and can help law- and policy-makers to obtain insight into the uncertainties of emission measurements in these specific situations.  相似文献   

20.
Commonly used sums-of-squares-based error or deviation statistics—like the standard deviation, the standard error, the coefficient of variation, and the root-mean-square error—often are misleading indicators of average error or variability. Sums-of-squares-based statistics are functions of at least two dissimilar patterns that occur within data. Both the mean of a set of error or deviation magnitudes (the average of their absolute values) and their variability influence the value of a sum-of-squares-based error measure, which confounds clear assessment of its meaning. Interpretation problems arise, according to Paul Mielke, because sums-of-squares-based statistics do not satisfy the triangle inequality. We illustrate the difficulties in interpreting and comparing these statistics using hypothetical data, and recommend the use of alternate statistics that are based on sums of error or deviation magnitudes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号