首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Model averaging, specifically information theoretic approaches based on Akaike’s information criterion (IT-AIC approaches), has had a major influence on statistical practices in the field of ecology and evolution. However, a neglected issue is that in common with most other model fitting approaches, IT-AIC methods are sensitive to the presence of missing observations. The commonest way of handling missing data is the complete-case analysis (the complete deletion from the dataset of cases containing any missing values). It is well-known that this results in reduced estimation precision (or reduced statistical power), biased parameter estimates; however, the implications for model selection have not been explored. Here we employ an example from behavioural ecology to illustrate how missing data can affect the conclusions drawn from model selection or based on hypothesis testing. We show how missing observations can be recovered to give accurate estimates for IT-related indices (e.g. AIC and Akaike weight) as well as parameters (and their standard errors) by utilizing ‘multiple imputation’. We use this paper to illustrate key concepts from missing data theory and as a basis for discussing available methods for handling missing data. The example is intended to serve as a practically oriented case study for behavioural ecologists deciding on how to handle missing data in their own datasets and also as a first attempt to consider the problems of conducting model selection and averaging in the presence of missing observations.  相似文献   

2.
After several decades during which applied statistical inference in research on animal behaviour and behavioural ecology has been heavily dominated by null hypothesis significance testing (NHST), a new approach based on information theoretic (IT) criteria has recently become increasingly popular, and occasionally, it has been considered to be generally superior to conventional NHST. In this commentary, I discuss some limitations the IT-based method may have under certain circumstances. In addition, I reviewed some recent articles published in the fields of animal behaviour and behavioural ecology and point to some common failures, misunderstandings and issues frequently appearing in the practical application of IT-based methods. Based on this, I give some hints about how to avoid common pitfalls in the application of IT-based inference, when to choose one or the other approach and discuss under which circumstances a mixing of the two approaches might be appropriate.  相似文献   

3.
Scientific thinking may require the consideration of multiple hypotheses, which often call for complex statistical models at the level of data analysis. The aim of this introduction is to provide a brief overview on how competing hypotheses are evaluated statistically in behavioural ecological studies and to offer potentially fruitful avenues for future methodological developments. Complex models have traditionally been treated by model selection approaches using threshold-based removal of terms, i.e. stepwise selection. A recently introduced method for model selection applies an information-theoretic (IT) approach, which simultaneously evaluates hypotheses by balancing between model complexity and goodness of fit. The IT method has been increasingly propagated in the field of ecology, while a literature survey shows that its spread in behavioural ecology has been much slower, and model simplification using stepwise selection is still more widespread than IT-based model selection. Why has the use of IT methods in behavioural ecology lagged behind other disciplines? This special issue examines the suitability of the IT method for analysing data with multiple predictors, which researchers encounter in our field. The volume brings together different viewpoints to aid behavioural ecologists in understanding the method, with the hope of enhancing the statistical integration of our discipline.  相似文献   

4.
Akaike’s information criterion (AIC) is increasingly being used in analyses in the field of ecology. This measure allows one to compare and rank multiple competing models and to estimate which of them best approximates the “true” process underlying the biological phenomenon under study. Behavioural ecologists have been slow to adopt this statistical tool, perhaps because of unfounded fears regarding the complexity of the technique. Here, we provide, using recent examples from the behavioural ecology literature, a simple introductory guide to AIC: what it is, how and when to apply it and what it achieves. We discuss multimodel inference using AIC—a procedure which should be used where no one model is strongly supported. Finally, we highlight a few of the pitfalls and problems that can be encountered by novice practitioners.  相似文献   

5.
Evolutionary mechanisms leading to correlations across different behaviours, called behavioural syndromes, are hard to study, mostly because behavioural syndromes are group/population level phenomena. Recently (Herczeg and Garamszegi Behav Ecol Sociobiol 66:161–169, 2012), we introduced the concept of syndrome deviation that allows the study of behavioural syndromes at the individual level by focusing on the individual deviation from the hypothetical perfect group-level behavioural correlation. Subsequently, Dingemanse et al. (Behav Ecol Sociobiol 66:1543–1548, 2012) emphasized that behavioural syndromes refer to the between-individual component of phenotypic correlations, and only this component is relevant for syndrome deviation. They also recommended mixed models to decompose the between- and within-individual correlations. We agree that separating these components is important, but the proposed approach is impractical to apply for functionally different behaviours because (1) the assumption of constant within-individual correlations is unjustified and (2) different behaviours cannot be measured at the same time. Further, our simulations based on mixed models show that the statistical differentiation between the within- and between-individual components is inefficient when using realistic sample sizes. Until the separation of between- and within-individual correlations is resolved, we recommend alternative approaches for empirical behavioural syndrome research that consider the repeatability of the behaviours and the optimal balance between within- and between-individual sample sizes. Syndrome deviation calculated from phenotypic correlations of traits that are proven to be individual specific, or from the between-individual correlations if possible, is a meaningful metric to describe behavioural consistency and to explain its evolutionary significance.  相似文献   

6.
Abstract:  Soberón and Llorente (1993) proposed pure-birth stochastic processes as theoretical models for species-accumulation curves, and these processes have frequently been used to describe the progress of biological inventories. We describe, in algorithmic form, an alternative statistical analysis based on a likelihood approach ( Díaz-Francés & Gorostiza 2002 ) that provides mathematical rigor to the ideas in Soberón and Llorente (1993) and improves the estimation of the models by incorporating the facts that the variance of the error is not constant and that the observations are correlated. Additionally, we used the likelihood ratios between candidate models as an objective procedure for model selection, allowing comparison between the goodness of fit of various models. The software for these statistical methods can now be downloaded off the Internet. We used two examples of butterfly data sets to illustrate the use of the methods and the software.  相似文献   

7.
English is widely recognized as the language of science, and English-language publications (ELPs) are rapidly increasing. It is often assumed that the number of non-ELPs is decreasing. This assumption contributes to the underuse of non-ELPs in conservation science, practice, and policy, especially at the international level. However, the number of conservation articles published in different languages is poorly documented. Using local and international search systems, we searched for scientific articles on biodiversity conservation published from 1980 to 2018 in English and 15 non-English languages. We compared the growth rate in publications across languages. In 12 of the 15 non-English languages, published conservation articles significantly increased every year over the past 39 years, at a rate similar to English-language articles. The other three languages showed contrasting results, depending on the search system. Since the 1990s, conservation science articles in most languages increased exponentially. The variation in the number of non-English-language articles identified among the search systems differed markedly (e.g., for simplified Chinese, 11,148 articles returned with local search system and 803 with Scopus). Google Scholar and local literature search systems returned the most articles for 11 and 4 non-English languages, respectively. However, the proportion of peer-reviewed conservation articles published in non-English languages was highest in Scopus, followed by Web of Science and local search systems, and lowest in Google Scholar. About 20% of the sampled non-English-language articles provided no title or abstract in English; thus, in theory, they were undiscoverable with English keywords. Possible reasons for this include language barriers and the need to disseminate research in countries where English is not widely spoken. Given the known biases in statistical methods and study characteristics between English- and non-English-language studies, non-English-language articles will continue to play an important role in improving the understanding of biodiversity and its conservation.  相似文献   

8.
Many studies have revealed repeatable (among-individual) variance in behavioural traits consistent with variation in animal personality; however, these studies are often conducted using data collected over single sampling periods, most commonly with short time intervals between observations. Consequently, it is not clear whether population-level patterns of behavioural variation are stable across longer timescales and/or multiple sampling periods or whether individuals maintain consistent ranking of behaviours (and/or personality) over their lifetimes. Here, we address these questions in a captive-bred population of a tropical freshwater poeciliid fish, Xiphophorus birchmanni. Using a multivariate approach, we estimate the among-individual variance-covariance matrix (I), for a set of behavioural traits repeatedly assayed in two different experimental contexts (open-field trials, emergence and exploration trials) over long-term (56 days between observations) and short-term (4-day observation interval) time periods. In both long- and short-term data sets, we find that traits are repeatable and the correlation structure of I is consistent with a latent axis of variation in boldness. While there are some qualitative differences in the way individual traits contribute to boldness and a tendency towards higher repeatabilities in the short-term study, overall, we find that population-level patterns of among-individual behavioural (co)variance to be broadly similar over both time frames. At the individual level, we find evidence that short-term studies can be informative for an individual’s behavioural phenotype over longer (e.g. lifetime) periods. However, statistical support is somewhat mixed and, at least for some observed behaviours, relative rankings of individual performance change significantly between data sets.  相似文献   

9.
There has been a great deal of recent discussion of the practice of regression analysis (or more generally, linear modelling) in behaviour and ecology. In this paper, I wish to highlight two factors that have been under-considered, collinearity and measurement error in predictors, as well as to consider what happens when both exist at the same time. I examine what the consequences are for conventional regression analysis (ordinary least squares, OLS) as well as model averaging methods, typified by information theoretic approaches based around Akaike’s information criterion. Collinearity causes variance inflation of estimated slopes in OLS analysis, as is well known. In the presence of collinearity, model averaging reduces this variance for predictors with weak effects, but also can lead to parameter bias. When collinearity is strong or when all predictors have strong effects, model averaging relies heavily on the full model including all predictors and hence the results from this and OLS are essentially the same. I highlight that it is not safe to simply eliminate collinear variables without due consideration of their likely independent effects as this can lead to biases. Measurement error is also considered and I show that when collinearity exists, this can lead to extreme biases when predictors are collinear, have strong effects but differ in their degree of measurement error. I highlight techniques for dealing with and diagnosing these problems. These results reinforce that automated model selection techniques should not be relied on in the analysis of complex multivariable datasets.  相似文献   

10.
Random forests for classification in ecology   总被引:27,自引:0,他引:27  
Cutler DR  Edwards TC  Beard KH  Cutler A  Hess KT  Gibson J  Lawler JJ 《Ecology》2007,88(11):2783-2792
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.  相似文献   

11.
Shen TJ  He F 《Ecology》2008,89(7):2052-2060
Most richness estimators currently in use are derived from models that consider sampling with replacement or from the assumption of infinite populations. Neither of the assumptions is suitable for sampling sessile organisms such as plants where quadrats are often sampled without replacement and the area of study is always limited. In this paper, we propose an incidence-based parametric richness estimator that considers quadrat sampling without replacement in a fixed area. The estimator is derived from a zero-truncated binomial distribution for the number of quadrats containing a given species (e.g., species i) and a modified beta distribution for the probability of presence-absence of a species in a quadrat. The maximum likelihood estimate of richness is explicitly given and can be easily solved. The variance of the estimate is also obtained. The performance of the estimator is tested against nine other existing incidence-based estimators using two tree data sets where the true numbers of species are known. Results show that the new estimator is insensitive to sample size and outperforms the other methods as judged by the root mean squared errors. The superiority of the new method is particularly noticeable when large quadrat size is used, suggesting that a few large quadrats are preferred over many small ones when sampling diversity.  相似文献   

12.
Phylogenetic comparative studies rely on species-specific data that often contain missing values and/or differ in sample size among species. These phenomena may violate statistical assumptions about the non-random variance component in sampling effort. A major reason why this assumption is often not fulfilled is because the probability of being sampled (i.e., being captured or observed) may depend on species-specific characteristics. Here, we test this assumption by using information on within-species sample sizes and missing data from five independent comparative datasets of European birds. First, we show that the two estimates of data availability (missing values and within-species sample size) are positively correlated and are associated with research effort in general (the number of papers published). Second, we demonstrate biologically meaningful relationships between data availability and phenotypic traits. For example, population size, risk-taking, and habitat specialization independently predicted within-species sample size. The key determinants of missing data were population size and distribution range. However, data availability was not structured by phylogenetic relationships. These results indicate that the accuracy of sampling is repeatable and distributed non-randomly among species, as several species-specific attributes determined the probability of observation. Therefore, data availability seems to be a species-specific trait that can be shaped by ecology, life history, and behavior. Such relationships raise issues about non-random sampling, which requires attention in comparative studies.  相似文献   

13.
Many statistical models in ecology follow the state space paradigm. For such models, the important step of model validation rarely receives as much attention as estimation or hypothesis testing, perhaps due to lack of available algorithms and software. Model validation is often based on a naive adaptation of Pearson residuals, i.e. the difference between observations and posterior means, even if this approach is flawed. Here, we consider validation of state space models through one-step prediction errors, and discuss principles and practicalities arising when the model has been fitted with a tool for estimation in general mixed effects models. Implementing one-step predictions in the R package Template Model Builder, we demonstrate that it is possible to perform model validation with little effort, even if the ecological model is multivariate, has non-linear dynamics, and whether observations are continuous or discrete. With both simulated data, and a real data set related to geolocation of seals, we demonstrate both the potential and the limitations of the techniques. Our results fill a need for convenient methods for validating a state space model, or alternatively, rejecting it while indicating useful directions in which the model could be improved.  相似文献   

14.
In many environmental and ecological studies, it is of interest to model compositional data. One approach is to consider positive random vectors that are subject to a unit-sum constraint. In landscape ecological studies, it is common that compositional data are also sampled in space with some elements of the composition absent at certain sampling sites. In this paper, we first propose a practical spatial multivariate ordered probit model for multivariate ordinal data, where the response variables can be viewed as the discretized non-negative compositions without the unit-sum constraint. We then propose a novel two-stage spatial mixture Dirichlet regression model. The first stage models the spatial dependence and the presence of exact zero values, and the second stage models all the non-zero compositional data. A maximum composite likelihood approach is developed for parameter estimation and inference in both the spatial multivariate ordered probit model and the two-stage spatial mixture Dirichlet regression model. The standard errors of the parameter estimates are computed by an estimate of the Godambe information matrix. A simulation study is conducted to evaluate the performance of the proposed models and methods. A land cover data example in landscape ecology further illustrates that accounting for spatial dependence can improve the accuracy in the prediction of presence/absence of different land covers as well as the magnitude of land cover compositions.  相似文献   

15.
Conn PB  Diefenbach DR 《Ecology》2007,88(8):1977-1983
Ecologists often use samples from the age or stage structure of a population to make inferences about population-level processes and to parameterize matrix models. Typically, researchers make a simplifying assumption that age and stage classes are determined without error, when in fact some level of misclassification often can be expected. If unaccounted for, misclassification will lead to overly optimistic levels of precision and can cause biased estimates of age or stage structure. Although several studies have used information from known-age individuals to quantify errors in age or stage distribution, the problem of estimating the age or stage structure in face of such errors has received comparably little attention. In this paper, we describe a general statistical framework for estimating the true stage distribution of a sample when misclassification rates can be estimated. The estimation process requires auxiliary information on misclassification rates, such as data from individuals of known age. We analyze age-structured harvest records from black bears in Pennsylvania to illustrate how incorporating misclassification errors leads to changes in point estimates and provides a measure of precision.  相似文献   

16.
We have re-evaluated the experimental methods and statistical procedures used to determine the relationship between feeding rates of pelagic herbivores and food concentration. Analysis of our own experiments, on Calanus pacificus feeding on Gyrodinium resplendens, and of other published research on this subject suggests the need for improvements in experimental design and methodology. We show that the use of mean concentration is statistically erroneous. First, it produces an artificial increase in the degrees of freedom that may result in the acceptance of nonsignificant regression lines. Second, it negates the value of replication, which is required to estimate sources of error. We present an example of how replication may be used to improve control over sources of error. Furthermore, we recommend the use of initial concentration rather than mean concentration. Finally, we introduce alternative methods to determine clearance and ingestion rates that enable the investigator to use replication and thus to estimate experimental errors.  相似文献   

17.
Statistics for correlated data: phylogenies, space, and time.   总被引:3,自引:0,他引:3  
Here we give an introduction to the growing number of statistical techniques for analyzing data that are not independent realizations of the same sampling process--in other words, correlated data. We focus on regression problems, in which the value of a given variable depends linearly on the value of another variable. To illustrate different types of processes leading to correlated data, we analyze four simulated examples representing diverse problems arising in ecological studies. The first example is a comparison among species to determine the relationship between home-range area and body size; because species are phylogenetically related, they do not represent independent samples. The second example addresses spatial variation in net primary production and how this might be affected by soil nitrogen; because nearby locations are likely to have similar net primary productivity for reasons other than soil nitrogen, spatial correlation is likely. In the third example, we consider a time-series model to ask whether the decrease in density of a butterfly species is the result of decreases in its host-plant density; because the population density of a species in one generation is likely to affect the density in the following generation, time-series data are often correlated. The fourth example combines both spatial and temporal correlation in an experiment in which prey densities are manipulated to determine the response of predators to their food supply. For each of these examples, we use a different statistical approach for analyzing models of correlated data. Our goal is to give an overview of conceptual issues surrounding correlated data, rather than a detailed tutorial in how to apply different statistical techniques. By dispelling some of the mystery behind correlated data, we hope to encourage ecologists to learn about statistics that could be useful in their own work. Although at first encounter these techniques might seem complicated, they have the power to simplify ecological research by making more types of data and experimental designs open to statistical evaluation.  相似文献   

18.
Abstract:  Species conservation risk assessments require accurate, probabilistic, and biologically meaningful maps of population distribution. In patchy populations, the reasons for discontinuities are not often well understood. We tested a novel approach to habitat modeling in which methods of small area estimation were used within a hierarchical Bayesian framework. Amphibian occurrence was modeled with logistic regression that included third-order drainages as hierarchical effects to account for patchy populations. Models including the random drainage effects adequately represented species occurrences in patchy populations of 4 amphibian species in the Oregon Coast Range (U.S.A.). Amphibian surveys from other locations within the same drainage were used to calibrate local drainage-scale effects. Cross-validation showed that prediction errors for calibrated models were 77% to 86% lower than comparable regionally constructed models, depending on species. When calibration data were unavailable, small area and regional models performed similarly, although poorly. Small area estimation models complement wildlife ecology and habitat studies, and can help managers develop a regional picture of the conservation status for relatively rare species.  相似文献   

19.
Behavioural ecologists often study complex systems in which multiple hypotheses could be proposed to explain observed phenomena. For some systems, simple controlled experiments can be employed to reveal part of the complexity; often, however, observational studies that incorporate a multitude of causal factors may be the only (or preferred) avenue of study. We assess the value of recently advocated approaches to inference in both contexts. Specifically, we examine the use of information theoretic (IT) model selection using Akaike’s information criterion (AIC). We find that, for simple analyses, the advantages of switching to an IT-AIC approach are likely to be slight, especially given recent emphasis on biological rather than statistical significance. By contrast, the model selection approach embodied by IT approaches offers significant advantages when applied to problems of more complex causality. Model averaging is an intuitively appealing extension to model selection. However, we were unable to demonstrate consistent improvements in prediction accuracy when using model averaging with IT-AIC; our equivocal results suggest that more research is needed on its utility. We illustrate our arguments with worked examples from behavioural experiments.  相似文献   

20.
On estimating the exponent of power-law frequency distributions   总被引:5,自引:0,他引:5  
White EP  Enquist BJ  Green JL 《Ecology》2008,89(4):905-912
Power-law frequency distributions characterize a wide array of natural phenomena. In ecology, biology, and many physical and social sciences, the exponents of these power laws are estimated to draw inference about the processes underlying the phenomenon, to test theoretical models, and to scale up from local observations to global patterns. Therefore, it is essential that these exponents be estimated accurately. Unfortunately, the binning-based methods traditionally used in ecology and other disciplines perform quite poorly. Here we discuss more sophisticated methods for fitting these exponents based on cumulative distribution functions and maximum likelihood estimation. We illustrate their superior performance at estimating known exponents and provide details on how and when ecologists should use them. Our results confirm that maximum likelihood estimation outperforms other methods in both accuracy and precision. Because of the use of biased statistical methods for estimating the exponent, the conclusions of several recently published papers should be revisited.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号