期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data? 总被引：1，自引：0，他引：1

Ver Hoef JM Boveng PL 《Ecology》2007,88(11):2766-2772

Quasi-Poisson and negative binomial regression models have equal numbers of parameters, and either could be used for overdispersed count data. While they often give similar results, there can be striking differences in estimating the effects of covariates. We explain when and why such differences occur. The variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative binomial model is a quadratic function of the mean. These variance relationships affect the weights in the iteratively weighted least-squares algorithm of fitting models to data. Because the variance is a function of the mean, large and small counts get weighted differently in quasi-Poisson and negative binomial regression. We provide an example using harbor seal counts from aerial surveys. These counts are affected by date, time of day, and time relative to low tide. We present results on a data set that showed a dramatic difference on estimating abundance of harbor seals when using quasi-Poisson vs. negative binomial regression. This difference is described and explained in light of the different weighting used in each regression method. A general understanding of weighting can help ecologists choose between these two methods. 相似文献

2.

Building statistical models to analyze species distributions.

Andrew M Latimer Shanshan Wu Alan E Gelfand John A Silander 《Ecological applications》2006,16(1):33-50

Models of the geographic distributions of species have wide application in ecology. But the nonspatial, single-level, regression models that ecologists have often employed do not deal with problems of irregular sampling intensity or spatial dependence, and do not adequately quantify uncertainty. We show here how to build statistical models that can handle these features of spatial prediction and provide richer, more powerful inference about species niche relations, distributions, and the effects of human disturbance. We begin with a familiar generalized linear model and build in additional features, including spatial random effects and hierarchical levels. Since these models are fully specified statistical models, we show that it is possible to add complexity without sacrificing interpretability. This step-by-step approach, together with attached code that implements a simple, spatially explicit, regression model, is structured to facilitate self-teaching. All models are developed in a Bayesian framework. We assess the performance of the models by using them to predict the distributions of two plant species (Proteaceae) from South Africa's Cape Floristic Region. We demonstrate that making distribution models spatially explicit can be essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results. Adding hierarchical levels to the models has further advantages in allowing human transformation of the landscape to be taken into account, as well as additional features of the sampling process. 相似文献

3.

Boosted trees for ecological modeling and prediction 总被引：14，自引：0，他引：14

De'ath G 《Ecology》2007,88(1):243-251

Accurate prediction and explanation are fundamental objectives of statistical analysis, yet they seldom coincide. Boosted trees are a statistical learning method that attains both of these objectives for regression and classification analyses. They can deal with many types of response variables (numeric, categorical, and censored), loss functions (Gaussian, binomial, Poisson, and robust), and predictors (numeric, categorical). Interactions between predictors can also be quantified and visualized. The theory underpinning boosted trees is presented, together with interpretive techniques. A new form of boosted trees, namely, "aggregated boosted trees" (ABT), is proposed and, in a simulation study, is shown to reduce prediction error relative to boosted trees. A regression data set is analyzed using ABT to illustrate the technique and to compare it with other methods, including boosted trees, bagged trees, random forests, and generalized additive models. A software package for ABT analysis using the R software environment is included in the Appendices together with worked examples. 相似文献

4.

Modeling binomial amphibian roadkill data in distance sampling while accounting for zero-inflation,serial correlation and varying cluster sizes simultaneously

M. Tariqul Hasan Gary Sneddon Renjun Ma 《Environmental and Ecological Statistics》2017,24(2):201-217

Roadkill is of ecological importance so that there is increasing academic research to understand the causes and patterns of roadkills and their impact on ecosystems. This work is motivated by the study on roadkills of endangered Bufo calamita (B. calamita) (The natterjack toad) out of amphibian roadkills. The status of B. calamita is regarded as unfavorable due to large population declines. In the mentioned study, B. calamita and total amphibian roadkills were recorded via distance sampling on a National Road of Southern Portugal between March 1995 and March 1997. The traditional binomial modeling of these data are challenged by three issues. First, the zeros in B. calamita counts far exceeded its nominal level. Second, there is likely serial correlation among observations along the road. Finally, there is varying number of total amphibian roadkills at each sampling location; therefore, there is likely randomness in the number of total amphibian roadkills. All these features may contribute to overdispersion in the binomial observations. These three issues are routinely addressed one at a time separately, the first through zero-inflated binomial models, the second, for example, by means of random effects models for serially correlated binomial data and the third by models for binomial data with random cluster sizes. Therefore the data cannot be adequately modeled by any of these separate models. In this paper, we propose a new model to tackle these three issues simultaneously in the binomial analysis of B. calamita roadkills out of amphibian roadkills. Our approach is generally applicable to other binomial data with these three features. 相似文献

5.

Modelling species distributions with penalised logistic regressions: A comparison with maximum entropy models 总被引：1，自引：0，他引：1

Aitor Gastón Juan I. García-Viñas 《Ecological modelling》2011,222(13):2037-2041

An important aspect of species distribution modelling is the choice of the modelling method because a suboptimal method may have poor predictive performance. Previous comparisons have found that novel methods, such as Maxent models, outperform well-established modelling methods, such as the standard logistic regression. These comparisons used training samples with small numbers of occurrences per estimated model parameter, and this limited sample size may have caused poorer predictive performance due to overfitting. Our hypothesis is that Maxent models would outperform a standard logistic regression because Maxent models avoid overfitting by using regularisation techniques and a standard logistic regression does not. Regularisation can be applied to logistic regression models using penalised maximum likelihood estimation. This estimation procedure shrinks the regression coefficients towards zero, causing biased predictions if applied to the training sample but improving the accuracy of new predictions. We used Maxent and logistic regression (standard and penalised) to analyse presence/pseudo-absence data for 13 tree species and evaluated the predictive performance (discrimination) using presence-absence data. The penalised logistic regression outperformed standard logistic regression and equalled the performance of Maxent. The penalised logistic regression may be considered one of the best methods to develop species distribution models trained with presence/pseudo-absence data, as it is comparable to Maxent. Our results encourage further use of the penalised logistic regression for species distribution modelling, especially in those cases in which a complex model must be fitted to a sample with a limited size. 相似文献

6.

Estimating trend precision and power to detect trends across grouped count data 总被引：1，自引：0，他引：1

Gray BR Burlew MM 《Ecology》2007,88(9):2364-2372

Ecologists commonly use grouped or clustered count data to estimate temporal trends in counts, abundance indices, or abundance. For example, the U.S. Breeding Bird Survey data represent multiple counts of birds from within each of multiple, spatially defined routes. Despite a reliance on grouped counts, analytical methods for prospectively estimating precision of trend estimates or statistical power to detect trends that explicitly acknowledge the characteristics of grouped count data are undescribed. These characteristics include the fact that the sampling variance is an increasing function of the mean, and that sampling and group-level variance estimates are generally estimated on different scales (the sampling and log scales, respectively). We address these issues for repeated sampling of a single population using an analytical approach that has the flavor of a generalized linear mixed model, specifically that of a negative binomial-distributed count variable with random group effects. The count mean, including grand intercept, trend, and random group effects, is modeled linearly on the log scale, while sampling variance of the mean is estimated on the log scale via the delta method. Results compared favorably with those derived using Monte Carlo simulations. For example, at trend = 5% per temporal unit, differences in standard errors and in power were modest relative to those estimated by simulation (< or = /11/% and < or = /16/%, respectively), with relative differences among power estimates decreasing to < or = /7/% when power estimated by simulations was > or = 0.50. Similar findings were obtained using data from nine surveys of fingernail clams in the Mississippi River. The proposed method is suggested (1) where simulations are not practical and relative precision or power is desired, or (2) when multiple precision or power calculations are required and where the accuracy of a fraction of those calculations will be confirmed using simulations. 相似文献

7.

Is the Mantel correlogram powerful enough to be useful in ecological analysis? A simulation study

Borcard D Legendre P 《Ecology》2012,93(6):1473-1481

The Mantel correlogram is an elegant way to compute a correlogram for multivariate data. However, recent papers raised concerns about the power of the Mantel test itself. Hence the question: Is the Mantel correlogram powerful enough to be useful? To explore this issue, we compared the performances of the Mantel correlogram to those of other methods, using numerical simulations based on random, normally distributed data. For a single response variable, we compared it to the Moran and Geary correlograms. Type I error rates of the three methods were correct. Power of the Mantel correlogram was nearly as high as that of the univariate methods. For the multivariate case, the test of the multivariate variogram developed in the context of multiscale ordination is in fact a Mantel test, so that the power of the two methods is the same by definition. We devised an alternative permutation test based on the variance, which yielded similar results. Overall, the power of the Mantel test was high, the method successfully detecting spatial correlation at rates similar to the permutation test of the variance statistic in multivariate variograms. We conclude that the Mantel correlogram deserves its place in the ecologist's toolbox. 相似文献

8.

Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests 总被引：3，自引：0，他引：3

Wim Aertsen Jos van Orshoven Bart Muys 《Ecological modelling》2010,221(8):1119-1130

Forestry science has a long tradition of studying the relationship between stand productivity and abiotic and biotic site characteristics, such as climate, topography, soil and vegetation. Many of the early site quality modelling studies related site index to environmental variables using basic statistical methods such as linear regression. Because most ecological variables show a typical non-linear course and a non-constant variance distribution, a large fraction of the variation remained unexplained by these linear models. More recently, the development of more advanced non-parametric and machine learning methods provided opportunities to overcome these limitations. Nevertheless, these methods also have drawbacks. Due to their increasing complexity they are not only more difficult to implement and interpret, but also more vulnerable to overfitting. Especially in a context of regionalisation, this may prove to be problematic. Although many non-parametric and machine learning methods are increasingly used in applications related to forest site quality assessment, their predictive performance has only been assessed for a limited number of methods and ecosystems.In this study, five different modelling techniques are compared and evaluated, i.e. multiple linear regression (MLR), classification and regression trees (CART), boosted regression trees (BRT), generalized additive models (GAM), and artificial neural networks (ANN). Each method is used to model site index of homogeneous stands of three important tree species of the Taurus Mountains (Turkey): Pinus brutia, Pinus nigra and Cedrus libani. Site index is related to soil, vegetation and topographical variables, which are available for 167 sample plots covering all important environmental gradients in the research area. The five techniques are compared in a multi-criteria decision analysis in which different model performance measures, ecological interpretability and user-friendliness are considered as criteria.When combining these criteria, in most cases GAM is found to outperform all other techniques for modelling site index for the three species. BRT is a good alternative in case the ecological interpretability of the technique is of higher importance. When user-friendliness is more important MLR and CART are the preferred alternatives. Despite its good predictive performance, ANN is penalized for its complex, non-transparent models and big training effort. 相似文献

9.

Bayesian Small Area Models for Assessing Wildlife Conservation Risk in Patchy Populations

DUNCAN S. WILSON‡ MARGO A. STODDARD† MATTHEW G. BETTS KLAUS J. PUETTMANN 《Conservation biology》2009,23(4):982-991

Abstract: Species conservation risk assessments require accurate, probabilistic, and biologically meaningful maps of population distribution. In patchy populations, the reasons for discontinuities are not often well understood. We tested a novel approach to habitat modeling in which methods of small area estimation were used within a hierarchical Bayesian framework. Amphibian occurrence was modeled with logistic regression that included third-order drainages as hierarchical effects to account for patchy populations. Models including the random drainage effects adequately represented species occurrences in patchy populations of 4 amphibian species in the Oregon Coast Range (U.S.A.). Amphibian surveys from other locations within the same drainage were used to calibrate local drainage-scale effects. Cross-validation showed that prediction errors for calibrated models were 77% to 86% lower than comparable regionally constructed models, depending on species. When calibration data were unavailable, small area and regional models performed similarly, although poorly. Small area estimation models complement wildlife ecology and habitat studies, and can help managers develop a regional picture of the conservation status for relatively rare species. 相似文献

10.

Estimation methods for nonlinear state-space models in ecology

M.W. Pedersen C.W. Berg U.H. Thygesen 《Ecological modelling》2011,222(8):1394-1400

The use of nonlinear state-space models for analyzing ecological systems is increasing. A wide range of estimation methods for such models are available to ecologists, however it is not always clear, which is the appropriate method to choose. To this end, three approaches to estimation in the theta logistic model for population dynamics were benchmarked by Wang (2007). Similarly, we examine and compare the estimation performance of three alternative methods using simulated data. The first approach is to partition the state-space into a finite number of states and formulate the problem as a hidden Markov model (HMM). The second method uses the mixed effects modeling and fast numerical integration framework of the AD Model Builder (ADMB) open-source software. The third alternative is to use the popular Bayesian framework of BUGS. The study showed that state and parameter estimation performance for all three methods was largely identical, however with BUGS providing overall wider credible intervals for parameters than HMM and ADMB confidence intervals. 相似文献

11.

An evaluation of three statistical methods used to model resource selection

David M. Baasch Andrew J. Tyre Scott E. Hygnstrom 《Ecological modelling》2010,221(4):565-574

The performance of statistical methods for modeling resource selection by animals is difficult to evaluate with field data because true selection patterns are unknown. Simulated data based on a known probability distribution, though, can be used to evaluate statistical methods. Models should estimate true selection patterns if they are to be useful in analyzing and interpreting field data. We used simulation techniques to evaluate the effectiveness of three statistical methods used in modeling resource selection. We generated 25 use locations per animal and included 10, 20, 40, or 80 animals in samples of use locations. To simulate species of different mobility, we generated use locations at four levels according to a known probability distribution across DeSoto National Wildlife Refuge (DNWR) in eastern Nebraska and western Iowa, USA. We either generated 5 random locations per use location or 10,000 random locations (total) within 4 predetermined areas around use locations to determine how the definition of availability and the number of random locations affected results. We analyzed simulated data using discrete choice, logistic-regression, and a maximum entropy method (Maxent). We used a simple linear regression of estimated and known probability distributions and area under receiver operating characteristic curves (AUC) to evaluate the performance of each method. Each statistical method was affected differently by number of animals and random locations used in analyses, level at which selection of resources occurred, and area considered available. Discrete-choice modeling resulted in precise and accurate estimates of the true probability distribution when the area in which use locations were generated was ≥ the area defined to be available. Logistic-regression models were unbiased and precise when the area in which use locations were generated and the area defined to be available were the same size; the fit of these models improved with increased numbers of random locations. Maxent resulted in unbiased and precise estimates of the known probability distribution when the area in which use locations were generated was small (home-range level) and the area defined to be available was large (study area). Based on AUC analyses, all models estimated the selection distribution better than random chance. Results from AUC analyses, however, often contradicted results of the linear regression method used to evaluate model performance. Discrete-choice modeling was best able to estimate the known selection distribution in our study area regardless of sample size or number of random locations used in the analyses, but we recommend further studies using simulated data over different landscapes and different resource metrics to confirm our results. Our study offers an approach and guidance for others interested in assessing the utility of techniques for modeling resource selection in their study area. 相似文献

12.

Joint modelling of breeding and survival in the kittiwake using frailty models

《Ecological modelling》2005,181(2-3):203-213

Assessment of population dynamics is central to population dynamics and conservation. In structured populations, matrix population models based on demographic data have been widely used to assess such dynamics. Although highlighted in several studies, the influence of heterogeneity among individuals in demographic parameters and of the possible correlation among these parameters has usually been ignored, mostly because of difficulties in estimating such individual-specific parameters. In the kittiwake (Rissa tridactyla), a long-lived seabird species, differences in survival and breeding probabilities among individual birds are well documented. Several approaches have been used in the animal ecology literature to establish the association between survival and breeding rates. However, most are based on observed heterogeneity between groups of individuals, an approach that seldom accounts for individual heterogeneity. Few attempts have been made to build models permitting estimation of the correlation between vital rates. For example, survival and breeding probability of individual birds were jointly modelled using logistic random effects models by [Cam, E., Link, W.A., Cooch, E.G., Monnat, J., Danchin, E., 2002. Individual covariation in life-history traits: seeing the trees despite the forest. Am. Naturalist, 159, in press]. This is the only example in wildlife animal populations we are aware of. Here we adopt the survival analysis approaches from epidemiology. We model the survival and the breeding probability jointly using a normally distributed random effect (frailty). Conditionally on this random effect, the survival time is modelled assuming a lognormal distribution, and breeding is modelled with a logistic model. Since the deaths are observed in year-intervals, we also take into account that the data are interval censored. The joint model is estimated using classic frequentist methods and also MCMC techniques in Winbugs. The association between survival and breeding attempt is quantified using the standard deviation of the random frailty parameters. We apply our joint model on a large data set of 862 birds, that was followed from 1984 to 1995 in Brittany (France). Survival is positively correlated with breeding indicating that birds with greater inclination to breed also had higher survival. 相似文献

13.

Comparing species abundance models

Joanne M. Potts Jane Elith 《Ecological modelling》2006

Five regression models (Poisson, negative binomial, quasi-Poisson, the hurdle model and the zero-inflated Poisson) were used to assess the relationship between the abundance of a vulnerable plant species, Leionema ralstonii, and the environment. The methods differed in their capacity to deal with common properties of ecological data. They were assessed theoretically, and their predictive performance was evaluated with correlation, calibration and error statistics calculated within a bootstrap evaluation procedure that simulated performance for independent data. 相似文献

14.

Improving inferences about private land conservation by accounting for incomplete reporting

Matthew A. Williamson Brett G. Dickson Mevin B. Hooten Rose A. Graves Mark N. Lubell Mark W. Schwartz 《Conservation biology》2021,35(4):1174-1185

Private lands provide key habitat for imperiled species and are core components of function protectected area networks; yet, their incorporation into national and regional conservation planning has been challenging. Identifying locations where private landowners are likely to participate in conservation initiatives can help avoid conflict and clarify trade-offs between ecological benefits and sociopolitical costs. Empirical, spatially explicit assessment of the factors associated with conservation on private land is an emerging tool for identifying future conservation opportunities. However, most data on private land conservation are voluntarily reported and incomplete, which complicates these assessments. We used a novel application of occupancy models to analyze the occurrence of conservation easements on private land. We compared multiple formulations of occupancy models with a logistic regression model to predict the locations of conservation easements based on a spatially explicit social–ecological systems framework. We combined a simulation experiment with a case study of easement data in Idaho and Montana (United States) to illustrate the utility of the occupancy framework for modeling conservation on private land. Occupancy models that explicitly accounted for variation in reporting produced estimates of predictors that were substantially less biased than estimates produced by logistic regression under all simulated conditions. Occupancy models produced estimates for the 6 predictors we evaluated in our case study that were larger in magnitude, but less certain than those produced by logistic regression. These results suggest that occupancy models result in qualitatively different inferences regarding the effects of predictors on conservation easement occurrence than logistic regression and highlight the importance of integrating variable and incomplete reporting of participation in empirical analysis of conservation initiatives. Failure to do so can lead to emphasizing the wrong social, institutional, and environmental factors that enable conservation and underestimating conservation opportunities in landscapes where social norms or institutional constraints inhibit reporting. 相似文献

15.

Modeling species co-occurrence by multivariate logistic regression generates new hypotheses on fungal interactions 总被引：2，自引：0，他引：2

Ovaskainen O Hottola J Siitonen J 《Ecology》2010,91(9):2514-2521

Signals of species interactions can be inferred from survey data by asking if some species occur more or less often together than what would be expected by random, or more generally, if any structural aspect of the community deviates from that expected from a set of independent species. However, a positive (or negative) association between two species does not necessarily signify a direct or indirect interaction, as it can result simply from the species having similar (or dissimilar) habitat requirements. We show how these two factors can be separated by multivariate logistic regression, with the regression part accounting for species-specific habitat requirements, and a correlation matrix for the positive or negative residual associations. We parameterize the model using Bayesian inference with data on 22 species of wood-decaying fungi acquired in 14 dissimilar forest sites. Our analyses reveal that some of the species commonly found to occur together in the same logs are likely to do so merely by similar habitat requirements, whereas other species combinations are systematically either over- or underrepresented also or only after accounting for the habitat requirements. We use our results to derive hypotheses on species interactions that can be tested in future experimental work. 相似文献

16.

Modelling skewed data with many zeros: A simple approach combining ordinary and logistic regression 总被引：1，自引：0，他引：1

David?Fletcher Email author Darryl?MacKenzie Eduardo?Villouta 《Environmental and Ecological Statistics》2005,12(1):45-54

We discuss a method for analyzing data that are positively skewed and contain a substantial proportion of zeros. Such data commonly arise in ecological applications, when the focus is on the abundance of a species. The form of the distribution is then due to the patchy nature of the environment and/or the inherent heterogeneity of the species. The method can be used whenever we wish to model the data as a response variable in terms of one or more explanatory variables. The analysis consists of three stages. The first involves creating two sets of data from the original: one shows whether or not the species is present; the other indicates the logarithm of the abundance when it is present. These are referred to as the presence data and the log-abundance data, respectively. The second stage involves modelling the presence data using logistic regression, and separately modelling the log-abundance data using ordinary regression. Finally, the third stage involves combining the two models in order to estimate the expected abundance for a specific set of values of the explanatory variables. A common approach to analyzing this sort of data is to use a ln (y+c) transformation, where c is some constant (usually one). The method we use here avoids the need for an arbitrary choice of the value of c, and allows the modelling to be carried out in a natural and straightforward manner, using well-known regression techniques. The approach we put forward is not original, having been used in both conservation biology and fisheries. Our objectives in this paper are to (a) promote the application of this approach in a wide range of settings and (b) suggest that parametric bootstrapping be used to provide confidence limits for the estimate of expected abundance. 相似文献

17.

Estimating equations for separable spatial-temporal binary data

Pei-Sheng Lin 《Environmental and Ecological Statistics》2010,17(4):543-557

For binary data with correlation across space and over time, the literature concerning the estimation of fixed effects in marginal models is limited. In this paper, we model the marginal probability of binary responses in terms of parameters of interest by a logistic function. An estimating equation based on the quasi-likelihood concept is developed to estimate parameters. Under separable correlation models, we show that the quasi-likelihood estimate is asymptotically optimal. A series of simulations is conducted to evaluate how the efficiency varies with the regression coefficients. We also compare the relative efficiency with another estimating equation by simulations. The proposed method is applied to an ecological study of forest decline to test independence of two spatial-temporal binary outcomes. 相似文献

18.

CircSiZer: an exploratory tool for circular data

María Oliveira Rosa M. Crujeiras Alberto Rodríguez-Casal 《Environmental and Ecological Statistics》2014,21(1):143-159

Smoothing methods and SiZer (SIgnificant ZERo crossing of the derivatives) are useful tools for exploring significant underlying structures in data samples. An extension of SiZer to circular data, namely CircSiZer, is introduced. Based on scale-space ideas, CircSiZer presents a graphical device to assess which observed features are statistically significant, both for density and regression analysis with circular data. The method is intended for analyzing the behavior of wind direction in the atlantic coast of Galicia (NW Spain) and how it has an influence over wind speed. The performance of CircSiZer is also checked with some simulated examples. 相似文献

19.

Modeling spatial aggregation of finite populations

Zillio T He F 《Ecology》2010,91(12):3698-3706

相似文献

20.

Using the negative binomial distribution to model overdispersion in ecological count data

Lindén A Mäntyniemi S 《Ecology》2011,92(7):1414-1421

A Poisson process is a commonly used starting point for modeling stochastic variation of ecological count data around a theoretical expectation. However, data typically show more variation than implied by the Poisson distribution. Such overdispersion is often accounted for by using models with different assumptions about how the variance changes with the expectation. The choice of these assumptions can naturally have apparent consequences for statistical inference. We propose a parameterization of the negative binomial distribution, where two overdispersion parameters are introduced to allow for various quadratic mean-variance relationships, including the ones assumed in the most commonly used approaches. Using bird migration as an example, we present hypothetical scenarios on how overdispersion can arise due to sampling, flocking behavior or aggregation, environmental variability, or combinations of these factors. For all considered scenarios, mean-variance relationships can be appropriately described by the negative binomial distribution with two overdispersion parameters. To illustrate, we apply the model to empirical migration data with a high level of overdispersion, gaining clearly different model fits with different assumptions about mean-variance relationships. The proposed framework can be a useful approximation for modeling marginal distributions of independent count data in likelihood-based analyses. 相似文献