Statistical Methods and Pitfalls in Environmental Data Analysis |
| |
Authors: | Yue Rong |
| |
Institution: | California Regional Water Quality Control Board, Los Angeles Region, 320 West 4th Street, Suite 200, Los Angeles, CA 90013, U.S.A. |
| |
Abstract: | This paper reviews four commonly used statistical methods for environmental data analysis and discusses potential pitfalls associated with application of these methods through real case study data. The four statistical methods are percentile and confidence interval, correlation coefficient, regression analysis, and analysis of variance (ANOVA). The potential pitfall for estimation of percentile and confidence interval includes the automatic assumption of a normal distribution to environmental data, which so often show a log-normal distribution. The potential pitfall for correlation coefficient includes the use of a wide range of data points in which the maximum in value may trivialize other smaller data points and consequently skew the correlation coefficient. The potential pitfall for regression analysis includes the propagation of uncertainties of input variables to the regression model prediction, which may be even more uncertain. The potential pitfall for ANOVA includes the acceptance of a hypothesis as a weak argument to imply a strong conclusion. As demonstrated in this paper, we may draw very different conclusions based on statistical analysis if the pitfalls are not identified. Reminder and enlightenment obtained from the pitfalls are given at the end of this article. |
| |
Keywords: | Normal Distribution Log-normal Distribution Percentile Confidence Interval Correlation Coefficient Regression Anova Groundwater Monitoring |
|
|