首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Zero-inflated models with application to spatial count data   总被引:1,自引:2,他引:1  
Count data arises in many contexts. Here our concern is with spatial count data which exhibit an excessive number of zeros. Using the class of zero-inflated count models provides a flexible way to address this problem. Available covariate information suggests formulation of such modeling within a regression framework. We employ zero-inflated Poisson regression models. Spatial association is introduced through suitable random effects yielding a hierarchical model. We propose fitting this model within a Bayesian framework considering issues of posterior propriety, informative prior specification and well-behaved simulation based model fitting. Finally, we illustrate the model fitting with a data set involving counts of isopod nest burrows for 1649 pixels over a portion of the Negev desert in Israel.  相似文献   

2.
At the time of European settlement, land surveys were conducted progressively westward throughout the United States. Outside of the original 13 colonies, surveys generally followed the Public Land Survey system in which trees, called witness trees, were regularly recorded at 1 mi by 1 mi grid intersections. This unintentional sampling provides insight into the composition and structure of pre-European settlement forests, which is used as baseline data to assess forest change following settlement. In this paper, a model for the Public Land Surveys of east central Alabama is developed. Assuming that the locations of trees of each species are realized from independent Poisson processes whose respective log intensities are linear functions of environmental covariates (i.e., elevation, landform, and physiographic province), the species observed at the survey grid intersections are independently sampled from a generalized logistic regression model. If all 68 species found in the survey were included, the model would be highly over-parameterized, so only the distribution of the most common taxon, pines, will be considered at this time. To assess the impact of environmental factors not included in the model, a hidden Gaussian random field shall be added as a random effect. A Markov Chain Monte Carlo algorithm is developed for Bayesian inference on model parameters, and for Bayes posterior prediction of the spatial distribution of pines in east central Alabama. Received: June 2004 / Revised: November 2004  相似文献   

3.
We propose a method for a Bayesian hierarchical analysis of count data that are observed at irregular locations in a bounded domain of R2. We model the data as having been observed on a fine regular lattice, where we do not have observations at all the sites. The counts are assumed to be independent Poisson random variables whose means are given by a log Gaussian process. In this article, the Gaussian process is assumed to be either a Markov random field (MRF) or a geostatistical model, and we compare the two models on an environmental data set. To make the comparison, we calibrate priors for the parameters in the geostatistical model to priors for the parameters in the MRF. The calibration is obtained empirically. The main goal is to predict the hidden Poisson-mean process at all sites on the lattice, given the spatially irregular count data; to do this we use an efficient MCMC. The spatial Bayesian methods are illustrated on radioactivity counts analyzed by Diggle et al. (1998).  相似文献   

4.
The scan statistic is widely used in spatial cluster detection applications of inhomogeneous Poisson processes. However, real data may present substantial departure from the underlying Poisson process. One of the possible departures has to do with zero excess. Some studies point out that when applied to data with excess zeros, the spatial scan statistic may produce biased inferences. In this work, we develop a closed-form scan statistic for cluster detection of spatial zero-inflated count data. We apply our methodology to simulated and real data. Our simulations revealed that the Scan-Poisson statistic steadily deteriorates as the number of zeros increases, producing biased inferences. On the other hand, our proposed Scan-ZIP and Scan-ZIP+EM statistics are, most of the time, either superior or comparable to the Scan-Poisson statistic.  相似文献   

5.
6.
Ecological theory and current evidence support the validity of various species response curves according to a variety of environmental gradients. Various methods have been developed for building species distribution models but it is not well known how these methods perform under various assumptions about the form of the underlying species response. It is also not well known how spatial correlation in species occurrence affects model performance. These effects were investigated by applying an environmental envelope method (BIOCLIM) and three regression-based methods: logistic regression (LR), generalized additive modelling (GAM), and classification and regression tree (CART) to simulated species occurrence data. Each simulated species was constructed as a sum of responses with varying weights. Three basic species response curves were assumed: Gaussian (bell-shaped), Beta (skew) and linear. The two non-linear responses conform to standard ecological niche theory. All three responses were applied in turn to three simulated environmental variables, each with varying degrees of spatial autocorrelation. GAM produced the most consistent model performance over all forms of simulated species response. BIOCLIM and CART were inclined to underrate the performance of variables with a linear response. BIOCLIM was less sensitive to data density. LR was susceptible to model misspecification. The use of a linear function in LR underestimated the performance of variables with non-linear species response and contributed to increased spatial autocorrelation in model residuals. Omission of important environmental variables with non-linear species response also contributed to increased spatial autocorrelation in model residuals. Adding a spatial autocovariate term to the LR model (autologistic model) reduced the spatial autocorrelation and improved model performance, but did not correct the misidentification of the dominant environmental determinant. This is to be expected since the autologistic approach was designed primarily for prediction and not for inference. Given that various forms of species response to environmental determinants arise commonly in nature: (1) higher order functions should always be tested when applying LR in modelling species distribution; (2) spatial autocorrelation in species distribution model residuals can indicate that environmental determinants with non-linear response are missing from the model; and (3) deficiencies in LR model performance due to model misspecification can be addressed by adding a spatial autocovariate to the model, but care should be taken when interpreting the coefficients of the model parameters.  相似文献   

7.
Ecologists wish to understand the role of traits of species in determining where each species occurs in the environment. For this, they wish to detect associations between species traits and environmental variables from three data tables, species count data from sites with associated environmental data and species trait data from data bases. These three tables leave a missing part, the fourth-corner. The fourth-corner correlations between quantitative traits and environmental variables, heuristically proposed 20 years ago, fill this corner. Generalized linear (mixed) models have been proposed more recently as a model-based alternative. This paper shows that the squared fourth-corner correlation times the total count is precisely the score test statistic for testing the linear-by-linear interaction in a Poisson log-linear model that also contains species and sites as main effects. For multiple traits and environmental variables, the score test statistic is proportional to the total inertia of a doubly constrained correspondence analysis. When the count data are over-dispersed compared to the Poisson or when there are other deviations from the model such as unobserved traits or environmental variables that interact with the observed ones, the score test statistic does not have the usual chi-square distribution. For these types of deviations, row- and column-based permutation methods (and their sequential combination) are proposed to control the type I error without undue loss of power (unless no deviation is present), as illustrated in a small simulation study. The issues for valid statistical testing are illustrated using the well-known Dutch Dune Meadow data set.  相似文献   

8.
GIS-based niche modeling for mapping species' habitat   总被引:3,自引:0,他引:3  
Rotenberry JT  Preston KL  Knick ST 《Ecology》2006,87(6):1458-1464
Ecological "niche modeling" using presence-only locality data and large-scale environmental variables provides a powerful tool for identifying and mapping suitable habitat for species over large spatial extents. We describe a niche modeling approach that identifies a minimum (rather than an optimum) set of basic habitat requirements for a species, based on the assumption that constant environmental relationships in a species' distribution (i.e., variables that maintain a consistent value where the species occurs) are most likely to be associated with limiting factors. Environmental variables that take on a wide range of values where a species occurs are less informative because they do not limit a species' distribution, at least over the range of variation sampled. This approach is operationalized by partitioning Mahalanobis D2 (standardized difference between values of a set of environmental variables for any point and mean values for those same variables calculated from all points at which a species was detected) into independent components. The smallest of these components represents the linear combination of variables with minimum variance; increasingly larger components represent larger variances and are increasingly less limiting. We illustrate this approach using the California Gnatcatcher (Polioptila californica Brewster) and provide SAS code to implement it.  相似文献   

9.
The statistical analysis of continuous data that is non-negative is a common task in quantitative ecology. An example, and our motivation, is the weight of a given fish species in a fish trawl. The analysis task is complicated by the occurrence of exactly zero observations. It makes many statistical methods for continuous data inappropriate. In this paper we propose a model that extends a Tweedie generalised linear model. The proposed model exploits the fact that a Tweedie distribution is equivalent to the distribution obtained by summing a Poisson number of gamma random variables. In the proposed model, both the number of gamma variates, and their average size, are modelled separately. The model has a composite link and has a flexible mean-variance relationship that can vary with covariates. We illustrate the model, and compare it to other models, using data from a fish trawl survey in south-east Australia.  相似文献   

10.
We derive some statistical properties of the distribution of two Negative Binomial random variables conditional on their total. This type of model can be appropriate for paired count data with Poisson over-dispersion such that the variance is a quadratic function of the mean. This statistical model is appropriate in many ecological applications including comparative fishing studies of two vessels and or gears. The parameter of interest is the ratio of pair means. We show that the conditional means and variances are different from the more commonly used Binomial model with variance adjusted for over-dispersion, or the Beta-Binomial model. The conditional Negative Binomial model is complicated because it does not eliminate nuisance parameters like in the Poisson case. Maximum likelihood estimation with the unconditional Negative Binomial model can result in biased estimates of the over-dispersion parameter and poor confidence intervals for the ratio of means when there are many nuisance parameters. We propose three approaches to deal with nuisance parameters in the conditional Negative Binomial model. We also study a random effects Binomial model for this type of data, and we develop an adjustment to the full-sample Negative Binomial profile likelihood to reduce the bias caused by nuisance parameters. We use simulations with these methods to examine bias, precision, and accuracy of estimators and confidence intervals. We conclude that the maximum likelihood method based on the full-sample Negative Binomial adjusted profile likelihood produces the best statistical inferences for the ratio of means when paired counts have Negative Binomial distributions. However, when there is uncertainty about the type of Poisson over-dispersion then a Binomial random effects model is a good choice.  相似文献   

11.
Traditional occupancy–abundance and abundance–variance–occupancy models do not take into account zero-inflation, which occurs when sampling rare species or in correlated counts arising from repeated measures. In this paper we propose a novel approach extending occupancy–abundance relationships to zero-inflated count data. This approach involves three steps: (1) selecting distributional assumptions and parsimonious models for the count data, (2) estimating abundance, occupancy and variance parameters as functions of site- and/or time-specific covariates, and (3) modelling the occupancy–abundance relationship using the parameters estimated in step 2. Five count datasets were used for comparing standard Poisson and negative binomial distribution (NBD) occupancy–abundance models. Zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) occupancy–abundance models were introduced for the first time, and these were compared with the Poisson, NBD, He and Gaston's and Wilson and Room's abundance–variance–occupancy models. The percentage of zero counts ranged from 45 to 80% in the datasets analysed. For most of the datasets, the ZINB occupancy–abundance model performed better than the traditional Poisson, NBD and Wilson and Room's model. He and Gaston's model performed better than the ZINB in two out of the five datasets. However, the occupancy predicted by all models increased faster than the observed as density increased resulting in significant mismatch at the highest densities. Limitations of the various models are discussed, and the need for careful choice of count distributions and predictors in estimating abundance and occupancy parameter are indicated.  相似文献   

12.
Forecasting the temporal trend of a focal species, its range expansion or retraction, provides crucial information regarding population viability. To this end, we require the accumulation of temporal records which is evidently time consuming. Progress in spatial data capturing has enabled rapid and accurate assessment of species distribution across large scales. Therefore, it would be appealing to infer the temporal trends of populations from the spatial structure of their distributions. Based on a combination of models from the fields of range dynamics, occupancy scaling and spatial autocorrelation, here I present a model for forecasting the population trend solely from its spatial distribution. Numerical tests using cellular automata confirm a positive correlation, as inferred from the model, between the temporal change in species range sizes and the exponent of the power-law scaling pattern of occupancy. The model is thus recommended for rapid estimation of species range dynamics from a single snapshot of its current distribution. Further applications in biodiversity conservation could provide a swift risk assessment, especially, for endangered and invasive species.  相似文献   

13.
An important decision in presence-only species distribution modeling is how to select background (or pseudo-absence) localities for model parameterization. The selection of such localities may influence model parameterization and thus, can influence the appropriateness and accuracy of the model prediction when extrapolating the species distribution across time and space. We used 12 species from the Australian Wet Tropics (AWT) to evaluate the relationship between the geographic extent from which pseudo-absences are taken and model performance, and shape and importance of predictor variables using the MAXENT modeling method. Model performance is lower when pseudo-absence points are taken from either a restricted or broad region with respect to species occurrence data than from an intermediate region. Furthermore, variable importance (i.e., contribution to the model) changed such that, models became increasingly simplified, dominated by just two variables, as the area from which pseudo-absence points were drawn increased. Our results suggest that it is important to consider the spatial extent from which pseudo-absence data are taken. We suggest species distribution modeling exercises should begin with exploratory analyses evaluating what extent might provide both the most accurate results and biologically meaningful fit between species occurrence and predictor variables. This is especially important when modeling across space or time—a growing application for species distributional modeling.  相似文献   

14.
The spatial pattern of organisms may be used to characterize their dispersal, quantify spread or estimate the point of introduction of an alien species. Their distribution may be represented by maps of individuals, or by counts or by presence/absence at known positions within a sampled area. The problems and relative merits of these different forms of data for spatial inference are discussed. Three datasets concerning dispersal from a single focus are analyzed: counts of aphids, Rhopalosiphum padi and Sitobion avenae, on barley plants, Hordeum vulgare, grown in experi- mental trays; mapped locations of couch grass, Elymus repens, tillers within plots of a field experiment; locations of sightings of the lupin aphid, Macrosiphum albifrons, as it invaded Great Britain between 1981 and 1984. A method for generating maps from counts is proposed to overcome problems caused by recording imprecision. Several statistics are used to quantify dispersal and spatial pattern in the experimental data and together provide a clear picture of the spatial pattern observed; they enabled several effects of the experimental treatments to be identified. The value of the statistics are compared. Estimates of the source of the lupin aphid invasion are obtained using the backtracking methods of Perry (1995b) and do not contradict previous suggestions.  相似文献   

15.
Forest stand management often depends on data from a single fixed area inventory plot located at random in a forest stand. The plot provides detailed information about tree size distribution but not about per unit area tree frequency distribution unless one assumes a Poisson (POI) distribution. The POI assumption ignores any relationship between a tree's size and its demand for growing space. This study argues for the Inverse Gaussian (IG) distribution as a more realistic model. Maximum likelihood estimates of the IG parameters are obtained from a transformation of tree size data (diameter) to proxies of tree counts. Data from two stands indicated that an IG model was better at predicting the tree frequency distribution than a POI model.  相似文献   

16.
Abstract: Biologists who develop and apply habitat models are often familiar with the statistical challenges posed by their data's spatial structure but are unsure of whether the use of complex spatial models will increase the utility of model results in planning. We compared the relative performance of nonspatial and hierarchical Bayesian spatial models for three vertebrate and invertebrate taxa of conservation concern (Church's sideband snails [Monadenia churchi], red tree voles [Arborimus longicaudus], and Pacific fishers [Martes pennanti pacifica]) that provide examples of a range of distributional extents and dispersal abilities. We used presence–absence data derived from regional monitoring programs to develop models with both landscape and site‐level environmental covariates. We used Markov chain Monte Carlo algorithms and a conditional autoregressive or intrinsic conditional autoregressive model framework to fit spatial models. The fit of Bayesian spatial models was between 35 and 55% better than the fit of nonspatial analogue models. Bayesian spatial models outperformed analogous models developed with maximum entropy (Maxent) methods. Although the best spatial and nonspatial models included similar environmental variables, spatial models provided estimates of residual spatial effects that suggested how ecological processes might structure distribution patterns. Spatial models built from presence–absence data improved fit most for localized endemic species with ranges constrained by poorly known biogeographic factors and for widely distributed species suspected to be strongly affected by unmeasured environmental variables or population processes. By treating spatial effects as a variable of interest rather than a nuisance, hierarchical Bayesian spatial models, especially when they are based on a common broad‐scale spatial lattice (here the national Forest Inventory and Analysis grid of 24 km2 hexagons), can increase the relevance of habitat models to multispecies conservation planning.  相似文献   

17.
To make a macrofaunal (crustacean) habitat potential map, the spatial distribution of ecological variables in the Hwangdo tidal flat, Korea, was explored. Spatial variables were mapped using remote sensing and a geographic information system (GIS) combined with field observations. A frequency ratio (FR) and logistic regression (LR) model were employed to map the macrofauna potential area for the Ilyoplax dentimerosa, a crustacean species. Spatial variables affecting the tidal macrofauna distribution were selected based on abundance and biomass and used within a spatial database derived from remotely sensed data of various types of sensors. The spatial variables included the intertidal digital elevation model (DEM), slope, distance from a tidal channel, tidal channel density, surface sediment facies, spectral reflectance of the near infrared (NIR) bands and the tidal exposure duration. The relation between the I. dentimerosa and each spatial variable was calculated using the FR and LR. The species was randomly divided into a training set (70%) to analyse habitat potential using FR and LR and a test set (30%) to validate the predicted habitat potential map. The relations were overlaid to produce a habitat potential map with the species potential index (SPI) value for each pixel. The potential habitat maps were compared with the surveyed habitat locations such as validation data set. The comparison results showed that the LR model (accuracy is 85.28%) is better in prediction than the FR (accuracy is 78.96%) model. The performance of models gave satisfactory accuracies. The LR provides the quantitative influence of variables on a potential habitat of species; otherwise, the FR shows the quantitative influence of a class in each variable. The combination of a GIS-based frequency ratio and logistic regression models and remote sensing with field observations is an effective method to determine locations favorable for macrofaunal species occurrences in a tidal flat.  相似文献   

18.
We model the points of the detection along the transect line by a Markov modulated Poisson process (MMPP). The MMPP can accommodate the spatial cluster structure typical of many line transect surveys. The basic idea is that animal density switches between a low and a high level according to a latent Markov process. The MMPP is attractive from a mathematical point of view, as it provides an explicit expression for the likelihood function and other important quantities. We focus on estimating the level of overdispersion in the number of detected animals, as this is important for quantifying the precision of the line transect estimator of animal abundance. The approach is illustrated using both simulated data and data from a minke whale sighting survey conducted in the North Atlantic. Received: August 2004 / Revised: August 2005  相似文献   

19.
Spatial autocorrelation in wildlife observation data arises when extrinsic environmental processes and patterns that influence the spatial distribution of wildlife are themselves spatially structured, or when species are subject to intrinsic population processes, causing contagion or dispersion effects. Territoriality, Allee effects, dispersal limitations, and social clustering are examples of intrinsic processes. Both forms of autocorrelation can violate the assumptions of generalized linear regression models, resulting in biased estimation of model coefficients and diminished predictive performance. Such consequences may be avoided for extrinsic autocorrelation when autocorrelated environmental variables are available for use as model covariates, whereas intrinsic spatial autocorrelation requires an alternative modeling approach. The autologistic model provides an approach suited to the binary observations often obtained in wildlife surveys, but its performance has not been tested across widely varying sampling intensities or strengths of intrinsic spatial structure. Here we use simulated data to test the autologistic model under a range of sampling conditions. The autologistic model obtains better fits and substantially better predictive performance than the standard logistic regression model over the full range of sampling designs and intensities tested. We provide a simple Bayesian implementation of the autologistic model, which until now has not been achieved with standard statistical software alone. A step-by-step procedure is given for characterizing and modeling spatial autocorrelation in binary observation data, along with computer code for fitting autologistic models in WinBUGS, a freeware Bayesian analysis package. This approach avoids normal approximations to the pseudo-likelihood, in contrast to previous Bayesian applications of the autologistic model. We provide a sample application of the autologistic model, fitted to survey data for a gliding marsupial in southeastern Australia.  相似文献   

20.
Bayesian hierarchical models were used to assess trends of harbor seals, Phoca vitulina richardsi, in Prince William Sound, Alaska, following the 1989 Exxon Valdez oil spill. Data consisted of 4–10 replicate observations per year at 25 sites over 10 years. We had multiple objectives, including estimating the effects of covariates on seal counts, and estimating trend and abundance, both per site and overall. We considered a Bayesian hierarchical model to meet our objectives. The model consists of a Poisson regression model for each site. For each observation the logarithm of the mean of the Poisson distribution was a linear model with the following factors: (1) intercept for each site and year, (2) time of year, (3) time of day, (4) time relative to low tide, and (5) tide height. The intercept for each site was then given a linear trend model for year. As part of the hierarchical model, parameters for each site were given a prior distribution to summarize overall effects. Results showed that at most sites, (1) trend is down; counts decreased yearly, (2) counts decrease throughout August, (3) counts decrease throughout the day, (4) counts are at a maximum very near to low tide, and (5) counts decrease as the height of the low tide increases; however, there was considerable variation among sites. To get overall trend we used a weighted average of the trend at each site, where the weights depended on the overall abundance of a site. Results indicate a 3.3% decrease per year over the time period.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号