首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Modelling species distributions with presence data from atlases, museum collections and databases is challenging. In this paper, we compare seven procedures to generate pseudo-absence data, which in turn are used to generate GLM-logistic regressed models when reliable absence data are not available. We use pseudo-absences selected randomly or by means of presence-only methods (ENFA and MDE) to model the distribution of a threatened endemic Iberian moth species (Graellsia isabelae). The results show that the pseudo-absence selection method greatly influences the percentage of explained variability, the scores of the accuracy measures and, most importantly, the degree of constraint in the distribution estimated. As we extract pseudo-absences from environmental regions further from the optimum established by presence data, the models generated obtain better accuracy scores, and over-prediction increases. When variables other than environmental ones influence the distribution of the species (i.e., non-equilibrium state) and precise information on absences is non-existent, the random selection of pseudo-absences or their selection from environmental localities similar to those of species presence data generates the most constrained predictive distribution maps, because pseudo-absences can be located within environmentally suitable areas. This study shows that if we do not have reliable absence data, the method of pseudo-absence selection strongly conditions the obtained model, generating different model predictions in the gradient between potential and realized distributions.  相似文献   

2.
Species distribution models (SDMs) based on statistical relationships between occurrence data and underlying environmental conditions are increasingly used to predict spatial patterns of biological invasions and prioritize locations for early detection and control of invasion outbreaks. However, invasive species distribution models (iSDMs) face special challenges because (i) they typically violate SDM's assumption that the organism is in equilibrium with its environment, and (ii) species absence data are often unavailable or believed to be too difficult to interpret. This often leads researchers to generate pseudo-absences for model training or utilize presence-only methods, and to confuse the distinction between predictions of potential vs. actual distribution. We examined the hypothesis that true-absence data, when accompanied by dispersal constraints, improve prediction accuracy and ecological understanding of iSDMs that aim to predict the actual distribution of biological invasions. We evaluated the impact of presence-only, true-absence and pseudo-absence data on model accuracy using an extensive dataset on the distribution of the invasive forest pathogen Phytophthora ramorum in California. Two traditional presence/absence models (generalized linear model and classification trees) and two alternative presence-only models (ecological niche factor analysis and maximum entropy) were developed based on 890 field plots of pathogen occurrence and several climatic, topographic, host vegetation and dispersal variables. The effects of all three possible types of occurrence data on model performance were evaluated with receiver operating characteristic (ROC) and omission/commission error rates. Results show that prediction of actual distribution was less accurate when we ignored true-absences and dispersal constraints. Presence-only models and models without dispersal information tended to over-predict the actual range of invasions. Models based on pseudo-absence data exhibited similar accuracies as presence-only models but produced spatially less feasible predictions. We suggest that true-absence data are a critical ingredient not only for accurate calibration but also for ecologically meaningful assessment of iSDMs that focus on predictions of actual distributions.  相似文献   

3.
Habitat classification models (HCMs) are invaluable tools for species conservation, land-use planning, reserve design, and metapopulation assessments, particularly at broad spatial scales. However, species occurrence data are often lacking and typically limited to presence points at broad scales. This lack of absence data precludes the use of many statistical techniques for HCMs. One option is to generate pseudo-absence points so that the many available statistical modeling tools can bb used. Traditional techniques generate pseudo-absence points at random across broadly defined species ranges, often failing to include biological knowledge concerning the species-habitat relationship. We incorporated biological knowledge of the species-habitat relationship into pseudo-absence points by creating habitat envelopes that constrain the region from which points were randomly selected. We define a habitat envelope as an ecological representation of a species, or species feature's (e.g., nest) observed distribution (i.e., realized niche) based on a single attribute, or the spatial intersection of multiple attributes. We created HCMs for Northern Goshawk (Accipiter gentilis atricapillus) nest habitat during the breeding season across Utah forests with extant nest presence points and ecologically based pseudo-absence points using logistic regression. Predictor variables were derived from 30-m USDA Landfire and 250-m Forest Inventory and Analysis (FIA) map products. These habitat-envelope-based models were then compared to null envelope models which use traditional practices for generating pseudo-absences. Models were assessed for fit and predictive capability using metrics such as kappa, threshold-independent receiver operating characteristic (ROC) plots, adjusted deviance (D(adj)2), and cross-validation, and were also assessed for ecological relevance. For all cases, habitat envelope-based models outperformed null envelope models and were more ecologically relevant, suggesting that incorporating biological knowledge into pseudo-absence point generation is a powerful tool for species habitat assessments. Furthermore, given some a priori knowledge of the species-habitat relationship, ecologically based pseudo-absence points can be applied to any species, ecosystem, data resolution, and spatial extent.  相似文献   

4.
5.
An important decision in presence-only species distribution modeling is how to select background (or pseudo-absence) localities for model parameterization. The selection of such localities may influence model parameterization and thus, can influence the appropriateness and accuracy of the model prediction when extrapolating the species distribution across time and space. We used 12 species from the Australian Wet Tropics (AWT) to evaluate the relationship between the geographic extent from which pseudo-absences are taken and model performance, and shape and importance of predictor variables using the MAXENT modeling method. Model performance is lower when pseudo-absence points are taken from either a restricted or broad region with respect to species occurrence data than from an intermediate region. Furthermore, variable importance (i.e., contribution to the model) changed such that, models became increasingly simplified, dominated by just two variables, as the area from which pseudo-absence points were drawn increased. Our results suggest that it is important to consider the spatial extent from which pseudo-absence data are taken. We suggest species distribution modeling exercises should begin with exploratory analyses evaluating what extent might provide both the most accurate results and biologically meaningful fit between species occurrence and predictor variables. This is especially important when modeling across space or time—a growing application for species distributional modeling.  相似文献   

6.
Abstract: Distribution models are used increasingly for species conservation assessments over extensive areas, but the spatial resolution of the modeled data and, consequently, of the predictions generated directly from these models are usually too coarse for local conservation applications. Comprehensive distribution data at finer spatial resolution, however, require a level of sampling that is impractical for most species and regions. Models can be downscaled to predict distribution at finer resolutions, but this increases uncertainty because the predictive ability of models is not necessarily consistent beyond their original scale. We analyzed the performance of downscaled, previously published models of environmental favorability (a generalized linear modeling technique) for a restricted endemic insectivore, the Iberian desman (Galemys pyrenaicus), and a more widespread carnivore, the Eurasian otter (Lutra lutra), in the Iberian Peninsula. The models, built from presence–absence data at 10 × 10 km resolution, were extrapolated to a resolution 100 times finer (1 × 1 km). We compared downscaled predictions of environmental quality for the two species with published data on local observations and on important conservation sites proposed by experts. Predictions were significantly related to observed presence or absence of species and to expert selection of sampling sites and important conservation sites. Our results suggest the potential usefulness of downscaled projections of environmental quality as a proxy for expensive and time‐consuming field studies when the field studies are not feasible. This method may be valid for other similar species if coarse‐resolution distribution data are available to define high‐quality areas at a scale that is practical for the application of concrete conservation measures.  相似文献   

7.
An important aspect of species distribution modelling is the choice of the modelling method because a suboptimal method may have poor predictive performance. Previous comparisons have found that novel methods, such as Maxent models, outperform well-established modelling methods, such as the standard logistic regression. These comparisons used training samples with small numbers of occurrences per estimated model parameter, and this limited sample size may have caused poorer predictive performance due to overfitting. Our hypothesis is that Maxent models would outperform a standard logistic regression because Maxent models avoid overfitting by using regularisation techniques and a standard logistic regression does not. Regularisation can be applied to logistic regression models using penalised maximum likelihood estimation. This estimation procedure shrinks the regression coefficients towards zero, causing biased predictions if applied to the training sample but improving the accuracy of new predictions. We used Maxent and logistic regression (standard and penalised) to analyse presence/pseudo-absence data for 13 tree species and evaluated the predictive performance (discrimination) using presence-absence data. The penalised logistic regression outperformed standard logistic regression and equalled the performance of Maxent. The penalised logistic regression may be considered one of the best methods to develop species distribution models trained with presence/pseudo-absence data, as it is comparable to Maxent. Our results encourage further use of the penalised logistic regression for species distribution modelling, especially in those cases in which a complex model must be fitted to a sample with a limited size.  相似文献   

8.
Empirical models for predicting the distribution of organisms from environmental data have often focused on principles of ecological niche theory. However, even at large scales, there is little agreement over how to represent the dimensions of a species’ niche. The performance of such models is greatly affected by the nature of species distributional and environmental data. Regional scale distribution models were developed for 30 willow species in Ontario to examine (i) the predictive ability of logistic regression analysis, and (ii) the effects of using different distributional and environmental data sets. Two original measures of model accuracy and over-prediction were employed and evaluated using independent data. Models based on unique combinations of monthly climate data predicted distributions most accurately for all species. Models based on a fixed set of variables, while generating the highest average probabilities of occurrence for certain species with limited ranges, resulted in the greatest under- and over-estimates of willow distributions. Comparisons of models demonstrated climatic patterns among willows of differing habit and habitat. The distribution of dwarf willow species, present only in the Ontario arctic, followed gradients of summer maximum temperatures. The distribution of the tree species in the southerly portions of the province followed gradients of fall and winter minimum temperatures. Regardless of distributional and environmental data input, no algorithm maximized model performance for all species. Individual species models require individual approaches; i.e., the variable selection technique, the set of environmental factors used as predictors, and the nature of species distributional data must be carefully matched to the intended application. An understanding of evolutionary processes enhances the meaningful interpretation of individual species models. Unless sampling bias and species prevalence can be accounted for, models based on collection point data are best used to guide field surveys. While inferred range data may be better suited to determine potential ecological niches, overestimation of species prevalence and environmental tolerance must be recognized. A combination of available distributional data types is recommended to best determine species niches, an important step in developing conservation strategies.  相似文献   

9.
Spatial autocorrelation in wildlife observation data arises when extrinsic environmental processes and patterns that influence the spatial distribution of wildlife are themselves spatially structured, or when species are subject to intrinsic population processes, causing contagion or dispersion effects. Territoriality, Allee effects, dispersal limitations, and social clustering are examples of intrinsic processes. Both forms of autocorrelation can violate the assumptions of generalized linear regression models, resulting in biased estimation of model coefficients and diminished predictive performance. Such consequences may be avoided for extrinsic autocorrelation when autocorrelated environmental variables are available for use as model covariates, whereas intrinsic spatial autocorrelation requires an alternative modeling approach. The autologistic model provides an approach suited to the binary observations often obtained in wildlife surveys, but its performance has not been tested across widely varying sampling intensities or strengths of intrinsic spatial structure. Here we use simulated data to test the autologistic model under a range of sampling conditions. The autologistic model obtains better fits and substantially better predictive performance than the standard logistic regression model over the full range of sampling designs and intensities tested. We provide a simple Bayesian implementation of the autologistic model, which until now has not been achieved with standard statistical software alone. A step-by-step procedure is given for characterizing and modeling spatial autocorrelation in binary observation data, along with computer code for fitting autologistic models in WinBUGS, a freeware Bayesian analysis package. This approach avoids normal approximations to the pseudo-likelihood, in contrast to previous Bayesian applications of the autologistic model. We provide a sample application of the autologistic model, fitted to survey data for a gliding marsupial in southeastern Australia.  相似文献   

10.
Predicting species distributions from samples collected along roadsides   总被引:1,自引:0,他引:1  
Predictive models of species distributions are typically developed with data collected along roads. Roadside sampling may provide a biased (nonrandom) sample; however, it is currently unknown whether roadside sampling limits the accuracy of predictions generated by species distribution models. We tested whether roadside sampling affects the accuracy of predictions generated by species distribution models by using a prospective sampling strategy designed specifically to address this issue. We built models from roadside data and validated model predictions at paired locations on unpaved roads and 200 m away from roads (off road), spatially and temporally independent from the data used for model building. We predicted species distributions of 15 bird species on the basis of point-count data from a landbird monitoring program in Montana and Idaho (U.S.A.). We used hierarchical occupancy models to account for imperfect detection. We expected predictions of species distributions derived from roadside-sampling data would be less accurate when validated with data from off-road sampling than when it was validated with data from roadside sampling and that model accuracy would be differentially affected by whether species were generalists, associated with edges, or associated with interior forest. Model performance measures (kappa, area under the curve of a receiver operating characteristic plot, and true skill statistic) did not differ between model predictions of roadside and off-road distributions of species. Furthermore, performance measures did not differ among edge, generalist, and interior species, despite a difference in vegetation structure along roadsides and off road and that 2 of the 15 species were more likely to occur along roadsides. If the range of environmental gradients is surveyed in roadside-sampling efforts, our results suggest that surveys along unpaved roads can be a valuable, unbiased source of information for species distribution models.  相似文献   

11.
12.
We introduce a methodology to infer zones of high potential for the habitat of a species, useful for management of biodiversity, conservation, biogeography, ecology, or sustainable use. Inference is based on a set of sites where the presence of the species has been reported. Each site is associated with covariate values, measured on discrete scales. We compute the predictive probability that the species is present at each node of a regular grid. Possible spatial bias for sites of presence is accounted for. Since the resulting posterior distribution does not have a closed form, a Markov chain Monte Carlo (MCMC) algorithm is implemented. However, we also describe an approximation to the posterior distribution, which avoids MCMC. Relevant features of the approach are that specific notions of data acquisition such as sampling intensity and detectability are accounted for, and that available a priori information regarding areas of distribution of the species is incorporated in a clear-cut way. These concepts, arising in the presence-only context, are not addressed in alternative methods. We also consider an uncertainty map, which measures the variability for the predictive probability at each node on the grid. A simulation study is carried out to test and compare our approach with other standard methods. Two case studies are also presented.  相似文献   

13.
Abstract: Species’ assessments must frequently be derived from opportunistic observations made by volunteers (i.e., citizen scientists). Interpretation of the resulting data to estimate population trends is plagued with problems, including teasing apart genuine population trends from variations in observation effort. We devised a way to correct for annual variation in effort when estimating trends in occupancy (species distribution) from faunal or floral databases of opportunistic observations. First, for all surveyed sites, detection histories (i.e., strings of detection–nondetection records) are generated. Within‐season replicate surveys provide information on the detectability of an occupied site. Detectability directly represents observation effort; hence, estimating detectablity means correcting for observation effort. Second, site‐occupancy models are applied directly to the detection‐history data set (i.e., without aggregation by site and year) to estimate detectability and species distribution (occupancy, i.e., the true proportion of sites where a species occurs). Site‐occupancy models also provide unbiased estimators of components of distributional change (i.e., colonization and extinction rates). We illustrate our method with data from a large citizen‐science project in Switzerland in which field ornithologists record opportunistic observations. We analyzed data collected on four species: the widespread Kingfisher (Alcedo atthis) and Sparrowhawk (Accipiter nisus) and the scarce Rock Thrush (Monticola saxatilis) and Wallcreeper (Tichodroma muraria). Our method requires that all observed species are recorded. Detectability was <1 and varied over the years. Simulations suggested some robustness, but we advocate recording complete species lists (checklists), rather than recording individual records of single species. The representation of observation effort with its effect on detectability provides a solution to the problem of differences in effort encountered when extracting trend information from haphazard observations. We expect our method is widely applicable for global biodiversity monitoring and modeling of species distributions.  相似文献   

14.
We explored the effects of prevalence, latitudinal range and clumping (spatial autocorrelation) of species distribution patterns on the predictive accuracy of eight state-of-the-art modelling techniques: Generalized Linear Models (GLMs), Generalized Boosting Method (GBM), Generalized Additive Models (GAMs), Classification Tree Analysis (CTA), Artificial Neural Network (ANN), Multivariate Adaptive Regression Splines (MARS), Mixture Discriminant Analysis (MDA) and Random Forest (RF). One hundred species of Lepidoptera, selected from the Distribution Atlas of European Butterflies, and three climate variables were used to determine the bioclimatic envelope for each butterfly species. The data set consisting of 2620 grid squares 30′ × 60′ in size all over Europe was randomly split into the calibration and the evaluation data sets. The performance of different models was assessed using the area under the curve (AUC) of a receiver operating characteristic (ROC) plot. Observed differences in modelling accuracy among species were then related to the geographical attributes of the species using GAM. The modelling performance was negatively related to the latitudinal range and prevalence, whereas the effect of spatial autocorrelation on prediction accuracy depended on the modelling technique. These three geographical attributes accounted for 19–61% of the variation in the modelling accuracy. Predictive accuracy of GAM, GLM and MDA was highly influenced by the three geographical attributes, whereas RF, ANN and GBM were moderately, and MARS and CTA only slightly affected. The contrasting effects of geographical distribution of species on predictive performance of different modelling techniques represent one source of uncertainty in species spatial distribution models. This should be taken into account in biogeographical modelling studies and assessments of climate change impacts.  相似文献   

15.
16.
Species distribution models have often been developed based on ecological data. To develop reliable data-driven models, however, a sound model training and evaluation procedures are needed. A crucial step in these procedures is the assessment of the model performance, with as key component the applied performance criterion. Therefore, we reviewed seven performance criteria commonly applied in presence-absence modelling (the correctly classified instances, Kappa, sensitivity, specificity, the normalised mutual information statistic, the true skill statistic and the odds ratio) and analysed their application in both the model training and evaluation process. Although estimates of predictive performance have been used widely to assess final model quality, a systematic overview was missing because most analyses of performance criteria have been empirical and only focused on specific aspects of the performance criteria. This paper provides such an overview showing that different performance criteria evaluate a model differently and that this difference may be explained by the dependency of these criteria on the prevalence of the validation set. We showed theoretically that these prevalence effects only occur if the data are inseparable by an n-dimensional hyperplane, n being the number of input variables. Given this inseparability, different performance criteria focus on different aspects of model performance during model training, such as sensitivity, specificity or predictive accuracy. These findings have important consequences for ecological modelling because ecological data are mostly inseparable due to data noise and the complexity of the studied system. Consequently, it should be very clear which aspect of the model performance is evaluated, and models should be evaluated consistently, that is, independent of, or taking into account, species prevalence. The practical implications of these findings are clear. They provide further insight into the evaluation of ecological presence/absence models and attempt to assist modellers in their choice of suitable performance criteria.  相似文献   

17.
Abstract:  Numerous models for predicting species distribution have been developed for conservation purposes. Most of them make use of environmental data (e.g., climate, topography, land use) at a coarse grid resolution (often kilometres). Such approaches are useful for conservation policy issues including reserve-network selection. The efficiency of predictive models for species distribution is usually tested on the area for which they were developed. Although highly interesting from the point of view of conservation efficiency, transferability of such models to independent areas is still under debate. We tested the transferability of habitat-based predictive distribution models for two regionally threatened butterflies, the green hairstreak ( Callophrys rubi ) and the grayling ( Hipparchia semele ), within and among three nature reserves in northeastern Belgium. We built predictive models based on spatially detailed maps of area-wide distribution and density of ecological resources. We used resources directly related to ecological functions (host plants, nectar sources, shelter, microclimate) rather than environmental surrogate variables. We obtained models that performed well with few resource variables. All models were transferable—although to different degrees—among the independent areas within the same broad geographical region. We argue that habitat models based on essential functional resources could transfer better in space than models that use indirect environmental variables. Because functional variables can easily be interpreted and even be directly affected by terrain managers, these models can be useful tools to guide species-adapted reserve management.  相似文献   

18.
Similarity-based mapping of the expected distribution of 10 orchid species was conducted in a study area covering 300 km2 in south-eastern Estonia. The observation track and species finds were recorded during fieldwork. Absence locations were generated on the line of observation track. Both presence and absence sites having an in-between distance of at least 100 m were used as training data. Expected presence/absence of a species was calculated according to similarity between the predictable location and selected observations (examples) of presence and absence sites. For each species, the machine learning system identified the best predictive sets by selecting the most useful variables out of 136 map and remote sensing features. Similarity-based estimations were evaluated both by training fit and by independent verification data. Reliability of the predictive maps was expressed also by usefulness ratios—the densities of validation find sites (1) in the predicted presence area relative to the density of those in the predicted absence area, and (2) relative to the share of the observation track in the predicted presence area and in the predicted absence area. The predictive mapping was most efficient for Dactylorhiza incarnata, D. russowii, Epipactis palustris, and Goodyera repens. We conclude that the profound coverage of observations on any larger area is unrealistic and the reliability of similarity-based predictive maps depends on the representativity of existing records relative to the diversity of the study area. The investigation showed that the studied species are much more common in nature than the records in the national database indicate.  相似文献   

19.
Eradication and control of invasive species are often possible only if populations are detected when they are small and localized. To be efficient, detection surveys should be targeted at locations where there is the greatest risk of incursions. We examine the utility of habitat suitability index (HSI) and particle dispersion models for targeting sampling for marine pests. Habitat suitability index models are a simple way to identify suitable habitat when species distribution data are lacking. We compared the performance of HSI models with statistical models derived from independent data from New Zealand on the distribution of two nonindigenous bivalves: Theora lubrica and Musculista senhousia. Logistic regression models developed using the HSI scores as predictors of the presence/absence of Theora and Musculista explained 26.7% and 6.2% of the deviance in the data, respectively. Odds ratios for the HSI scores were greater than unity, indicating that they were genuine predictors of the presence/ absence of each species. The fit and predictive accuracy of each logistic model were improved when simulated patterns of dispersion from the nearest port were added as a predictor variable. Nevertheless, the combined model explained, at best, 46.5% of the deviance in the distribution of Theora and correctly predicted 56% of true presences and 50% of all cases. Omission errors were between 6% and 16%. Although statistical distribution models built directly from environmental predictors always outperformed the equivalent HSI models, the gain in model fit and accuracy was modest. High residual deviance in both types of model suggests that the distributions realized by Theora and Musculista in the field data were influenced by factors not explicitly modeled as explanatory variables and by error in the environmental data used to project suitable habitat for the species. Our results highlight the difficulty of accurately predicting the distribution of invasive marine species that exhibit low habitat occupancy and patchy distributions in time and space. Although the HSI and statistical models had utility as predictors of the likely distribution of nonindigenous marine species, the level of spatial accuracy achieved with them may be well below expectations for sensitive surveillance programs.  相似文献   

20.
Various methods exist to model a species’ niche and geographic distribution using environmental data for the study region and occurrence localities documenting the species’ presence (typically from museums and herbaria). In presence-only modelling, geographic sampling bias and small sample sizes represent challenges for many species. Overfitting to the bias and/or noise characteristic of such datasets can seriously compromise model generality and transferability, which are critical to many current applications - including studies of invasive species, the effects of climatic change, and niche evolution. Even when transferability is not necessary, applications to many areas, including conservation biology, macroecology, and zoonotic diseases, require models that are not overfit. We evaluated these issues using a maximum entropy approach (Maxent) for the shrew Cryptotis meridensis, which is endemic to the Cordillera de Mérida in Venezuela. To simulate strong sampling bias, we divided localities into two datasets: those from a portion of the species’ range that has seen high sampling effort (for model calibration) and those from other areas of the species’ range, where less sampling has occurred (for model evaluation). Before modelling, we assessed the climatic values of localities in the two datasets to determine whether any environmental bias accompanies the geographic bias. Then, to identify optimal levels of model complexity (and minimize overfitting), we made models and tuned model settings, comparing performance with that achieved using default settings. We randomly selected localities for model calibration (sets of 5, 10, 15, and 20 localities) and varied the level of model complexity considered (linear versus both linear and quadratic features) and two aspects of the strength of protection against overfitting (regularization). Environmental bias indeed corresponded to the geographic bias between datasets, with differences in median and observed range (minima and/or maxima) for some variables. Model performance varied greatly according to the level of regularization. Intermediate regularization consistently led to the best models, with decreased performance at low and generally at high regularization. Optimal levels of regularization differed between sample-size-dependent and sample-size-independent approaches, but both reached similar levels of maximal performance. In several cases, the optimal regularization value was different from (usually higher than) the default one. Models calibrated with both linear and quadratic features outperformed those made with just linear features. Results were remarkably consistent across the examined sample sizes. Models made with few and biased localities achieved high predictive ability when appropriate regularization was employed and optimal model complexity was identified. Species-specific tuning of model settings can have great benefits over the use of default settings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号