首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The testing for an association between two categorical variables using count data is commonplace in the behavioral sciences. Here, we present evidence that influential biostatistical textbooks give contradictory and incomplete advice on good practice in the analysis of such contingency table data. We survey the statistical literature and offer guidance on such analyses. Specifically, we call for greater use of exact testing rather than tests which use an asymptotic chi-squared distribution. That is, we suggest that researchers take a conservative approach and only perform asymptotic testing where there is little doubt that it is appropriate. We recommend a specific criterion for such decision-making. Where asymptotic testing is appropriate, we recommend chi-squared over the G-test and recommend against the implementation of Yates (or any other) correction. We also provide advice on the effective use of exact testing for associations in contingency tables. Lastly, we highlight issues that need to be considered when using the commonly recommended Fisher’s exact test.  相似文献   

2.
The statistical analysis of environmental data from remote sensing and Earth system simulations often entails the analysis of gridded spatio-temporal data, with a hypothesis test being performed for each grid cell. When the whole image or a set of grid cells are analyzed for a global effect, the problem of multiple testing arises. When no global effect is present, we expect $$ \alpha $$% of all grid cells to be false positives, and spatially autocorrelated data can give rise to clustered spurious rejections that can be misleading in an analysis of spatial patterns. In this work, we review standard solutions for the multiple testing problem and apply them to spatio-temporal environmental data. These solutions are independent of the test statistic, and any test statistic can be used (e.g., tests for trends or change points in time series). Additionally, we introduce permutation methods and show that they have more statistical power. Real-world data are used to provide examples of the analysis, and the performance of each method is assessed in a simulation study. Unlike other simulation studies, our study compares the statistical power of the presented methods in a comprehensive simulation study. In conclusion, we present several statistically rigorous methods for analyzing spatio-temporal environmental data and controlling the false positives. These methods allow the use of any test statistic in a wide range of applications in environmental sciences and remote sensing.  相似文献   

3.
《Ecological modelling》2007,200(1-2):1-19
Given the importance of knowledge of species distribution for conservation and climate change management, continuous and progressive evaluation of the statistical models predicting species distributions is necessary. Current models are evaluated in terms of ecological theory used, the data model accepted and the statistical methods applied. Focus is restricted to Generalised Linear Models (GLM) and Generalised Additive Models (GAM). Certain currently unused regression methods are reviewed for their possible application to species modelling.A review of recent papers suggests that ecological theory is rarely explicitly considered. Current theory and results support species responses to environmental variables to be unimodal and often skewed though process-based theory is often lacking. Many studies fail to test for unimodal or skewed responses and straight-line relationships are often fitted without justification.Data resolution (size of sampling unit) determines the nature of the environmental niche models that can be fitted. A synthesis of differing ecophysiological ideas and the use of biophysical processes models could improve the selection of predictor variables. A better conceptual framework is needed for selecting variables.Comparison of statistical methods is difficult. Predictive success is insufficient and a test of ecological realism is also needed. Evaluation of methods needs artificial data, as there is no knowledge about the true relationships between variables for field data. However, use of artificial data is limited by lack of comprehensive theory.Three potentially new methods are reviewed. Quantile regression (QR) has potential and a strong theoretical justification in Liebig's law of the minimum. Structural equation modelling (SEM) has an appealing conceptual framework for testing causality but has problems with curvilinear relationships. Geographically weighted regression (GWR) intended to examine spatial non-stationarity of ecological processes requires further evaluation before being used.Synthesis and applications: explicit theory needs to be incorporated into species response models used in conservation. For example, testing for unimodal skewed responses should be a routine procedure. Clear statements of the ecological theory used, the nature of the data model and sufficient details of the statistical method are needed for current models to be evaluated. New statistical methods need to be evaluated for compatibility with ecological theory before use in applied ecology. Some recent work with artificial data suggests the combination of ecological knowledge and statistical skill is more important than the precise statistical method used. The potential exists for a synthesis of current species modelling approaches based on their differing ecological insights not their methodology.  相似文献   

4.
Testing the Accuracy of Population Viability Analysis   总被引:3,自引:0,他引:3  
  相似文献   

5.
The statistical literature contains many univariate and multivariate skewness measures that allow two datasets to be compared, some of which are defined in terms of quantile values. In most situations, the comparison between two random vectors focuses on univariate comparisons of conditional random variables truncated in quantiles; this kind of comparison is of particular interest in the environmental sciences. In this work, we describe a new approach to comparing skewness in terms of the univariate convex transform ordering proposed by van Zwet (Convex transformations of random variables. Mathematical Centre Tracts, Amsterdam, 1964), associated with skewness as well as concentration. The key to these comparisons is the underlying dependence structure of the random vectors. Below we describe graphical tools and use several examples to illustrate these comparisons.  相似文献   

6.
Ecologists wish to understand the role of traits of species in determining where each species occurs in the environment. For this, they wish to detect associations between species traits and environmental variables from three data tables, species count data from sites with associated environmental data and species trait data from data bases. These three tables leave a missing part, the fourth-corner. The fourth-corner correlations between quantitative traits and environmental variables, heuristically proposed 20 years ago, fill this corner. Generalized linear (mixed) models have been proposed more recently as a model-based alternative. This paper shows that the squared fourth-corner correlation times the total count is precisely the score test statistic for testing the linear-by-linear interaction in a Poisson log-linear model that also contains species and sites as main effects. For multiple traits and environmental variables, the score test statistic is proportional to the total inertia of a doubly constrained correspondence analysis. When the count data are over-dispersed compared to the Poisson or when there are other deviations from the model such as unobserved traits or environmental variables that interact with the observed ones, the score test statistic does not have the usual chi-square distribution. For these types of deviations, row- and column-based permutation methods (and their sequential combination) are proposed to control the type I error without undue loss of power (unless no deviation is present), as illustrated in a small simulation study. The issues for valid statistical testing are illustrated using the well-known Dutch Dune Meadow data set.  相似文献   

7.
Ter Braak CJ  Cormont A  Dray S 《Ecology》2012,93(7):1525-1526
The fourth-corner problem entails estimation and statistical testing of the relationship between species traits and environmental variables from the analysis of three data tables. In a 2008 paper, S. Dray and P. Legendre proposed and evaluated five permutation methods for statistical significance testing, including a new two-step testing procedure. However, none of these attained the correct type I error in all cases of interest. We solve this problem by showing that a small modification of their two-step procedure controls the type I error in all cases. The modification consists of adjusting the significance level from mean square root of alpha to alpha or, equivalently, of reporting the maximum of the individual P. values as the final one. The test is also applicable to the three-table ordination method RLQ.  相似文献   

8.
Non-parametric statistical tests are commonly used in the behavioral sciences. Researchers need to be aware that non-parameteric methods involving ranks can perform unreliably as a result of very small amounts of noise added in the storage and manipulation of values by computers, causing spurious reduction in the number of ties. In order to avoid this problem, researchers should round values to an appropriate number of decimal places prior to the ranking procedure to ensure that data points whose values cannot be separated according to the precision of their measurement are recorded as having identical rank. We also recommend exact rather than asymptotic evaluation of p values in non-parametric statistical tests.  相似文献   

9.
Structural equation modeling is an advanced multivariate statistical process with which a researcher can construct theoretical concepts, test their measurement reliability, hypothesize and test a theory about their relationships, take into account measurement errors, and consider both direct and indirect effects of variables on one another. Latent variables are theoretical concepts that unite phenomena under a single term, e.g., ecosystem health, environmental condition, and pollution (Bollen, 1989). Latent variables are not measured directly but can be expressed in terms of one or more directly measurable variables called indicators. For some researchers, defining, constructing, and examining the validity of latent variables may be the end task of itself. For others, testing hypothesized relationships of latent variables may be of interest. We analyzed the correlation matrix of eleven environmental variables from the U.S. Environmental Protection Agency's (USEPA) Environmental Monitoring and Assessment Program for Estuaries (EMAP-E) using methods of structural equation modeling. We hypothesized and tested a conceptual model to characterize the interdependencies between four latent variables-sediment contamination, natural variability, biodiversity, and growth potential. In particular, we were interested in measuring the direct, indirect, and total effects of sediment contamination and natural variability on biodiversity and growth potential. The model fit the data well and accounted for 81% of the variability in biodiversity and 69% of the variability in growth potential. It revealed a positive total effect of natural variability on growth potential that otherwise would have been judged negative had we not considered indirect effects. That is, natural variability had a negative direct effect on growth potential of magnitude –0.3251 and a positive indirect effect mediated through biodiversity of magnitude 0.4509, yielding a net positive total effect of 0.1258. Natural variability had a positive direct effect on biodiversity of magnitude 0.5347 and a negative indirect effect mediated through growth potential of magnitude –0.1105 yielding a positive total effects of magnitude 0.4242. Sediment contamination had a negative direct effect on biodiversity of magnitude –0.1956 and a negative indirect effect on growth potential via biodiversity of magnitude –0.067. Biodiversity had a positive effect on growth potential of magnitude 0.8432, and growth potential had a positive effect on biodiversity of magnitude 0.3398. The correlation between biodiversity and growth potential was estimated at 0.7658 and that between sediment contamination and natural variability at –0.3769.  相似文献   

10.
Judicious Use of Multiple Hypothesis Tests   总被引:5,自引:0,他引:5  
Abstract:  When analyzing a table of statistical results, one must first decide whether adjustment of significance levels is appropriate. If the main goal is hypothesis generation or initial screening for potential conservation problems, then it may be appropriate to use the standard comparisonwise significance level to avoid Type II errors (not detecting real differences or trends). If the main goal is rigorous testing of a hypothesis, however, then an adjustment for multiple tests is needed. To control the familywise Type I error rate (the probability of rejecting at least one true null hypothesis), sequential modifications of the standard Bonferroni method, such as Holm's method, will provide more statistical power than the standard Bonferroni method. Additional power may be achieved through procedures that control the false discovery rate (FDR) (the expected proportion of false positives among tests found to be significant). Holm's sequential Bonferroni method and two FDR-controlling procedures were applied to the results of multiple-regression analyses of the relationship between habitat variables and the abundance of 25 species of forest birds in Japan, and the FDR-controlling procedures provided considerably greater statistical power.  相似文献   

11.
Plant functional response groups (PFGs) are now widely established as a tool to investigate plant—environment relationships. Different statistical methods to form PFGs are used in the literature. One way is to derive emergent groups by classifying species based on correlation of biological attributes and subjecting these groups to tests of response to environmental variables. Another way is to search for associations of occurrence data, environmental variables and trait data simultaneously. The fourth-corner method is one way to assess the relationships between single traits and habitat factors. We extended this statistical method to a generally applicable procedure for the generation of plant functional response groups by developing new randomization procedures for presence/absence data of plant communities. Previous PFG groupings used either predefined groups or emergent groups i.e. classifications based on correlations of biological attributes (Lavorel et al Trends Ecol Evol 12:474–478, 1997), of the global species pool and assessed their functional response. However, since not all PFGs might form emergent groups or may be known by experts, we used a permutation procedure to optimise functional grouping. We tested the method using an artificial test data set of virtual plants occurring in different disturbance treatments. Direct trait-treatment relationships as well as more complex associations are incorporated in the test data. Trait combinations responding to environmental variables could be clearly distinguished from non-responding combinations. The results are compared with the method suggested by Pillar (J Veg Sci 10:631–640) for the identification of plant functional groups. After exploring the statistical properties using an artificial data set, the method is applied to experimental data of a greenhouse experiment on the assemblage of plant communities. Four plant functional response groups are formed with regard to differences in soil fertility on the basis of the traits canopy height and spacer length.  相似文献   

12.
GIS and geostatistics: Essential partners for spatial analysis   总被引:20,自引:0,他引:20  
Initially, geographical information systems (GIS) concentrated on two issues: automated map making, and facilitating the comparison of data on thematic maps. The first required high quality graphics, vector data models and powerful data bases, the second is based on grid cells that can be manipulated by suites of mathematical operators collectively termed map algebra. Both kinds of GIS are widely available and are taught in many universities and technical colleges. After more than 20 years of development, most standard GIS provide both kinds of functionality and good quality graphic display, but until recently they have not included the methods of statistics and geostatistics as tools for spatial analysis. Recently, standard statistical packages have been linked to GIS for both exploratory data analysis and statistical analysis and hypothesis testing. Standard statistical packages include methods for the analysis of random samples of cases or objects that are not necessarily co-located in space—if the results of statistical analysis display a spatial pattern then that is because the underlying data also share that pattern. Geostatistics addresses the need to make predictions of sampled attributes (i.e., maps) at unsampled locations from sparse, often expensive data. To make up for lack of hard data geostatistics has concentrated on the development of powerful methods based on stochastic theory. Though there have been recent moves to incorporate ancillary data in geostatistical analyses, insufficient attention has been paid to using modern methods of data display for the visualization of results. GIS can serve geostatistics by aiding geo-registration of data, facilitating spatial exploratory data analysis, providing a spatial context for interpolation and conditional simulation, as well as providing easy-to-use and effective tools for data display and visualization. The value of geostatistics for GIS lies in the provision of reliable interpolation methods with known errors, methods of upscaling and generalization, and for supplying multiple realizations of spatial patterns that can be used in environmental modeling. These stochastic methods are improving understanding of how errors in models of spatial processes accrue from errors in data or incompleteness in the structure of the models. New developments in GIS, based on ideas taken from map algebra, cellular automata and image analysis are providing high level programming languages for modeling dynamic processes such as erosion or the development of alluvial fans and deltas. Research has demonstrated that these models need stochastic inputs to yield realistic results. Non-stochastic tools such as fuzzy subsets have been shown to be useful for spatial analysis when probabilistic approaches are inappropriate or impossible. The conclusion is that in spite of differences in history and approach, the linkage of GIS, statistics and geostatistics provides a powerful, and complementary suite of tools for spatial analysis in the agricultural, earth and environmental sciences.  相似文献   

13.
Random forests for classification in ecology   总被引:27,自引:0,他引:27  
Cutler DR  Edwards TC  Beard KH  Cutler A  Hess KT  Gibson J  Lawler JJ 《Ecology》2007,88(11):2783-2792
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.  相似文献   

14.
Despite broad recognition of the value of social sciences and increasingly vocal calls for better engagement with the human element of conservation, the conservation social sciences remain misunderstood and underutilized in practice. The conservation social sciences can provide unique and important contributions to society's understanding of the relationships between humans and nature and to improving conservation practice and outcomes. There are 4 barriers—ideological, institutional, knowledge, and capacity—to meaningful integration of the social sciences into conservation. We provide practical guidance on overcoming these barriers to mainstream the social sciences in conservation science, practice, and policy. Broadly, we recommend fostering knowledge on the scope and contributions of the social sciences to conservation, including social scientists from the inception of interdisciplinary research projects, incorporating social science research and insights during all stages of conservation planning and implementation, building social science capacity at all scales in conservation organizations and agencies, and promoting engagement with the social sciences in and through global conservation policy‐influencing organizations. Conservation social scientists, too, need to be willing to engage with natural science knowledge and to communicate insights and recommendations clearly. We urge the conservation community to move beyond superficial engagement with the conservation social sciences. A more inclusive and integrative conservation science—one that includes the natural and social sciences—will enable more ecologically effective and socially just conservation. Better collaboration among social scientists, natural scientists, practitioners, and policy makers will facilitate a renewed and more robust conservation. Mainstreaming the conservation social sciences will facilitate the uptake of the full range of insights and contributions from these fields into conservation policy and practice.  相似文献   

15.
16.
Hydrology, roadway traffic conditions, and atmospheric deposition are three essential data categories for the planning and implementation of highway-runoff monitoring and characterization programs. Causal variables pertaining to each data category could be site specific but have been shown to correlate with runoff pollutant loads. These data categories were combined to derive statistical relationships for characterization and prioritization of the respective pollutant loads at highway runoff sites. Storm runoff data of total suspended solids (TSS), total dissolved solid (TDS), chemical oxygen demand (COD), total Kjeldahl nitrogen (TKN) and total phosphorus (TP) collected from three highway sites in Charlotte, North Carolina, USA, were used to illustrate the development of site-specific highway-runoff pollutant loading models. This unified methodology provides a basis for initial assessment of the pollutant-constituent loads from highway runoff using hydrologic component variables. Improved reliability is achievable when additional traffic and/or atmospheric component variables are incorporated into the basic hydrologic regression model. In addition, operational guidance is suggested for implementing highway-runoff monitoring programs that are subject to sampling and resources constraints.  相似文献   

17.
This paper seeks to verify the usefulness of selected multivariate statistical techniques for exploring new dose-response relationships between human health and air pollution. We do so by comparing our results with those already established in the literature through hypothesis testing procedures or laboratory work. This use of multivariate techniques is pretheoretical and should be interpreted as suggesting relationships which warrant further investigation with more traditional methodologies. Our results conform very well with those existing in the literature and lend credence to the use of such pretheoretical statistical methods.  相似文献   

18.
Statistical packages such as edgeR and DESeq are intended to detect genes that are relevant to phenotypic traits and diseases. A few studies have also modeled the relationships between gene expressions and traits. In the presence of multicollinearity and outliers, which are unavoidable in genetic data, the robust ridge regression estimator can be applied with the trait value as the response variable and the gene expressions as explanatory variables. In some simulation scenarios, the robust ridge estimator is resistant to outliers and less susceptible to multicollinearity than the ordinary least-squares (OLS) estimator. This study investigated the reliability of the robust ridge estimator, in a scenario where the explanatory variables have tail-dependence and negative binomial distributions, by comparing its performance to that of OLS using vine copula to model the tail-dependence among gene expressions. The robust ridge estimator and OLS were both applied to an ecological dataset. First, statistical analysis was used to compare RNA sequencing data between two treatments; then, 15 differentially expressed genes were selected. Next, the regression parameter estimates of robust ridge and OLS for the effects of the 15 contigs (explanatory variables) on trait values (response variables) were compared. Robust ridge regression was found to detect fewer positive and negative slopes than OLS regression. These results indicate that robust ridge regression can be successfully applied for RNA sequencing analysis to estimate the effect of trait-associated genes using real data, and holds great promise as a tool for modeling the association between RNA expression and phenotypic traits.  相似文献   

19.
Modeling Human Factors That Affect the Loss of Biodiversity   总被引:2,自引:0,他引:2  
Within conservation biology human factors are treated as driving forces of biodiversity loss, yet there are few empirical studies on how human actions affect biodiversity. We developed and tested an interdisciplinary model of biodiversity loss using socioeconomic and ecological data from 107 countries and structural equation modeling techniques. Some portions of the model fit the data well, other parts were less predictive. Counterintuitive results may be a result of the quality and availability of cross-national data and statistical limitations in testing a model of such complex processes. This model test provides insight into future research needs for examining human impacts on biodiversity. Issues including data quality, temporal and spatial scale, and model refinement are outlined. The results highlight the importance of relations between human social systems and biodiversity and the potential of interdisciplinary research.  相似文献   

20.
Habitat variability makes site-specific considerations a necessity in the specification of water quality standards. The U.S. Environmental Protection Agency (EPA) has recognized this in its development of procedures for site-specific modification of national standards. These procedures involves translation of laboratory toxicology data into field situations where such data are often poor predictors of biotoxicity. The whole problem is poorly specified.This paper formulates a system theory approach to better specification of the problems associated with setting water quality standards. A state space model of the general environmental protection problem is presented: Find a set of diagnostic variables whose maintenance within specifiable limits (standards) is both necessary and sufficient to protect all variables of a subject ecosystem. The program for this comprises a site-specific protocol.Stages in such a protocol include (1) choice of diagnostic variables, (2) establishment of necessity and sufficiency for these variables, and (3) determination of standards through (4) toxicity testing. Problems associated with the latter include (a) spatiotemporal variability, system (b) linearity-nonlinearity, and (c) stationarity-nonstationarity, and (d) monitoring for: baseline information, impact detection, determining compliance, establishing causality and making predictions. Each of these problems is structured in terms of the state space model. Then, current procedures of the EPA site-specific methodology are reviewed, and a set of recommendations proposed for their improvement using the system theory formulation to guide further developments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号