Most performance criteria which have been applied to train ecological models focus on the accuracy of the model predictions. However, these criteria depend on the prevalence of the training set and often do not take into account ecological issues such as the distinction between omission and commission errors. Moreover, a previous study indicated that model training based on different performance criteria results in different optimised models. Therefore, model developers should train models based on different performance criteria and select the most appropriate model depending on the modelling objective. This paper presents a new approach to train fuzzy models based on an adjustable performance criterion, called the adjusted average deviation (aAD). This criterion was applied to develop a species distribution model for spawning grayling in the Aare River near Thun, Switzerland. To analyse the strengths and weaknesses of this approach, it was compared to model training based on other performance criteria. The results suggest that model training based on accuracy-based performance criteria may produce unrealistic models at extreme prevalences of the training set, whereas the aAD allows for the identification of more accurate and more reliable models. Moreover, the adjustable parameter in this criterion enables modellers to situate the optimised models in the search space and thus provides an indication of the ecological model relevance. Consequently, it may support modellers and river managers in the decision making process by improving model reliability and insight into the modelling process. Due to the universality and the flexibility of the approach, it could be applied to any other ecosystem or species, and may therefore be valuable to ecological modelling and ecosystem management in general.  相似文献   

Species distribution models have often been developed based on ecological data. To develop reliable data-driven models, however, a sound model training and evaluation procedures are needed. A crucial step in these procedures is the assessment of the model performance, with as key component the applied performance criterion. Therefore, we reviewed seven performance criteria commonly applied in presence-absence modelling (the correctly classified instances, Kappa, sensitivity, specificity, the normalised mutual information statistic, the true skill statistic and the odds ratio) and analysed their application in both the model training and evaluation process. Although estimates of predictive performance have been used widely to assess final model quality, a systematic overview was missing because most analyses of performance criteria have been empirical and only focused on specific aspects of the performance criteria. This paper provides such an overview showing that different performance criteria evaluate a model differently and that this difference may be explained by the dependency of these criteria on the prevalence of the validation set. We showed theoretically that these prevalence effects only occur if the data are inseparable by an n-dimensional hyperplane, n being the number of input variables. Given this inseparability, different performance criteria focus on different aspects of model performance during model training, such as sensitivity, specificity or predictive accuracy. These findings have important consequences for ecological modelling because ecological data are mostly inseparable due to data noise and the complexity of the studied system. Consequently, it should be very clear which aspect of the model performance is evaluated, and models should be evaluated consistently, that is, independent of, or taking into account, species prevalence. The practical implications of these findings are clear. They provide further insight into the evaluation of ecological presence/absence models and attempt to assist modellers in their choice of suitable performance criteria.  相似文献   

Large, fine-grained samples are ideal for predictive species distribution models used for management purposes, but such datasets are not available for most species and conducting such surveys is costly. We attempted to overcome this obstacle by updating previously available coarse-grained logistic regression models with small fine-grained samples using a recalibration approach. Recalibration involves re-estimation of the intercept or slope of the linear predictor and may improve calibration (level of agreement between predicted and actual probabilities). If reliable estimates of occurrence likelihood are required (e.g., for species selection in ecological restoration) calibration should be preferred to other model performance measures. This updating approach is not expected to improve discrimination (the ability of the model to rank sites according to species suitability), because the rank order of predictions is not altered. We tested different updating methods and sample sizes with tree distribution data from Spain. Updated models were compared to models fitted using only fine-grained data (refitted models). Updated models performed reasonably well at fine scales and outperformed refitted models with small samples (10-100 occurrences). If a coarse-grained model is available (or could be easily developed) and fine-grained predictions are to be generated from a limited sample size, updating previous models may be a more accurate option than fitting a new model. Our results encourage further studies on model updating in other situations where species distribution models are used under different conditions from their training (e.g., different time periods, different regions).  相似文献   

Species distribution models (SDMs) are increasingly used in conservation and land-use planning as inputs to describe biodiversity patterns. These models can be built in different ways, and decisions about data preparation, selection of predictor variables, model fitting, and evaluation all alter the resulting predictions. Commonly, the true distribution of species is unknown and independent data to verify which SDM variant to choose are lacking. Such model uncertainty is of concern to planners. We analyzed how 11 routine decisions about model complexity, predictors, bias treatment, and setting thresholds for predicted values altered conservation priority patterns across 25 species. Models were created with MaxEnt and run through Zonation to determine the priority rank of sites. Although all SDM variants performed well (area under the curve >0.7), they produced spatially different predictions for species and different conservation priority solutions. Priorities were most strongly altered by decisions to not address bias or to apply binary thresholds to predicted values; on average 40% and 35%, respectively, of all grid cells received an opposite priority ranking. Forcing high model complexity altered conservation solutions less than forcing simplicity (14% and 24% of cells with opposite rank values, respectively). Use of fewer species records to build models or choosing alternative bias treatments had intermediate effects (25% and 23%, respectively). Depending on modeling choices, priority areas overlapped as little as 10–20% with the baseline solution, affecting top and bottom priorities differently. Our results demonstrate the extent of model-based uncertainty and quantify the relative impacts of SDM building decisions. When it is uncertain what the best SDM approach and conservation plan is, solving uncertainty or considering alterative options is most important for those decisions that change plans the most.  相似文献   

As large carnivores recover throughout Europe, their distribution needs to be studied to determine their conservation status and assess the potential for human-carnivore conflicts. However, efficient monitoring of many large carnivore species is challenging due to their rarity, elusive behavior, and large home ranges. Their monitoring can include opportunistic sightings from citizens in addition to designed surveys. Two types of detection errors may occur in such monitoring schemes: false negatives and false positives. False-negative detections can be accounted for in species distribution models (SDMs) that deal with imperfect detection. False-positive detections, due to species misidentification, have rarely been accounted for in SDMs. Generally, researchers use ad hoc data-filtering methods to discard ambiguous observations prior to analysis. These practices may discard valuable ecological information on the distribution of a species. We investigated the costs and benefits of including data types that may include false positives rather than discarding them for SDMs of large carnivores. We used a dynamic occupancy model that simultaneously accounts for false negatives and positives to jointly analyze data that included both unambiguous detections and ambiguous detections. We used simulations to compare the performances of our model with a model fitted on unambiguous data only. We tested the 2 models in 4 scenarios in which parameters that control false-positive detections and true detections varied. We applied our model to data from the monitoring of the Eurasian lynx (Lynx lynx) in the European Alps. The addition of ambiguous detections increased the precision of parameter estimates. For the Eurasian lynx, incorporating ambiguous detections produced more precise estimates of the ecological parameters and revealed additional occupied sites in areas where the species is likely expanding. Overall, we found that ambiguous data should be considered when studying the distribution of large carnivores through the use of dynamic occupancy models that account for misidentification.  相似文献   

Developing robust species distribution models is important as model outputs are increasingly being incorporated into conservation policy and management decisions. A largely overlooked component of model assessment and refinement is whether to include historic species occurrence data in distribution models to increase the data sample size. Data of different temporal provenance often differ in spatial accuracy and precision. We test the effect of inclusion of historic coarse-resolution occurrence data on distribution model outputs for 187 species of birds in Australian tropical savannas. Models using only recent (after 1990), fine-resolution data had significantly higher model performance scores measured with area under the receiver operating characteristic curve (AUC) than models incorporating both fine- and coarse-resolution data. The drop in AUC score is positively correlated with the total area predicted to be suitable for the species (R2 = 0.163-0.187, depending on the environmental predictors in the model), as coarser data generally leads to greater predicted areas. The remaining unexplained variation is likely to be due to the covariate errors resulting from resolution mismatch between species records and environmental predictors. We conclude that decisions regarding data use in species distribution models must be conscious of the variation in predictions that mixed-scale datasets might cause.  相似文献   

We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by resubstitution rates were similar for each lichen species irrespective of the underlying sample survey form. Cross-validation estimates of prediction accuracies were lower than resubstitution accuracies for all species and both design types, and in all cases were closer to the true prediction accuracies based on the EVALUATION data set. We argue that greater emphasis should be placed on calculating and reporting cross-validation accuracy rates rather than simple resubstitution accuracy rates. Evaluation of the DESIGN and PURPOSIVE tree models on the EVALUATION data set shows significantly lower prediction accuracy for the PURPOSIVE tree models relative to the DESIGN models, indicating that non-probabilistic sample surveys may generate models with limited predictive capability. These differences were consistent across all four lichen species, with 11 of the 12 possible species and sample survey type comparisons having significantly lower accuracy rates. Some differences in accuracy were as large as 50%. The classification tree structures also differed considerably both among and within the modelled species, depending on the sample survey form. Overlap in the predictor variables selected by the DESIGN and PURPOSIVE tree models ranged from only 20% to 38%, indicating the classification trees fit the two evaluated survey forms on different sets of predictor variables. The magnitude of these differences in predictor variables throws doubt on ecological interpretation derived from prediction models based on non-probabilistic sample surveys.  相似文献   

Five regression models (Poisson, negative binomial, quasi-Poisson, the hurdle model and the zero-inflated Poisson) were used to assess the relationship between the abundance of a vulnerable plant species, Leionema ralstonii, and the environment. The methods differed in their capacity to deal with common properties of ecological data. They were assessed theoretically, and their predictive performance was evaluated with correlation, calibration and error statistics calculated within a bootstrap evaluation procedure that simulated performance for independent data.  相似文献   

Elephant seals are among the most sexually dimorphic and polygynous species of all mammals. Their foraging grounds occupy a wide area of the world oceans, where they show spatial segregation between males and females. The objective of this paper was to correlate female and male foraging distributions of Mirounga angustirostris with main climatic variables at a biogeographical scale. We used website and bibliographical sources to obtain information on adult elephant seal distribution and environmental predictors (surface and bottom sea temperatures, productivity and bathymetry) and three species distribution models [maximum entropy model, environmental niche factor analysis and based on climatic envelopes (BIOCLIM)] to predict the habitat suitability of ocean regions. BIOCLIM provided the best fit. Sea surface and bottom temperatures were the variables with the highest explanatory power for females, while bathymetry was for males. Predictive maps suggest that low temperatures constrain female, but not male, distribution at high latitudes. We suggest that large size increases foraging efficiency of males because, among other benefits, it augments thermal insulation, improving the use of cold, rich sectors of the ocean. Different thermoregulatory abilities between sexes due to size dimorphism should be a complementary explanation of sexual segregation in elephant seals.  相似文献   

Fluvial fishes face increased imperilment from anthropogenic activities, but the specific factors contributing most to range declines are often poorly understood. For example, the range of the fluvial‐specialist shoal bass (Micropterus cataractae) continues to decrease, yet how perceived threats have contributed to range loss is largely unknown. We used species distribution models to determine which factors contributed most to shoal bass range loss. We estimated a potential distribution based on natural abiotic factors and a series of currently occupied distributions that incorporated variables characterizing land cover, non‐native species, and river fragmentation intensity (no fragmentation, dams only, and dams and large impoundments). We allowed interspecific relationships between non‐native congeners and shoal bass to vary across fragmentation intensities. Results from the potential distribution model estimated shoal bass presence throughout much of their native basin, whereas models of currently occupied distribution showed that range loss increased as fragmentation intensified. Response curves from models of currently occupied distribution indicated a potential interaction between fragmentation intensity and the relationship between shoal bass and non‐native congeners, wherein non‐natives may be favored at the highest fragmentation intensity. Response curves also suggested that >100 km of interconnected, free‐flowing stream fragments were necessary to support shoal bass presence. Model evaluation, including an independent validation, suggested that models had favorable predictive and discriminative abilities. Similar approaches that use readily available, diverse, geospatial data sets may deliver insights into the biology and conservation needs of other fluvial species facing similar threats.  相似文献   

Hijmans RJ 《Ecology》2012,93(3):679-688
Species distribution models are usually evaluated with cross-validation. In this procedure evaluation statistics are computed from model predictions for sites of presence and absence that were not used to train (fit) the model. Using data for 226 species, from six regions, and two species distribution modeling algorithms (Bioclim and MaxEnt), I show that this procedure is highly sensitive to "spatial sorting bias": the difference between the geographic distance from testing-presence to training-presence sites and the geographic distance from testing-absence (or testing-background) to training-presence sites. I propose the use of pairwise distance sampling to remove this bias, and the use of a null model that only considers the geographic distance to training sites to calibrate cross-validation results for remaining bias. Model evaluation results (AUC) were strongly inflated: the null model performed better than MaxEnt for 45% and better than Bioclim for 67% of the species. Spatial sorting bias and area under the receiver-operator curve (AUC) values increased when using partitioned presence data and random-absence data instead of independently obtained presence-absence testing data from systematic surveys. Pairwise distance sampling removed spatial sorting bias, yielding null models with an AUC close to 0.5, such that AUC was the same as null model calibrated AUC (cAUC). This adjustment strongly decreased AUC values and changed the ranking among species. Cross-validation results for different species are only comparable after removal of spatial sorting bias and/or calibration with an appropriate null model.  相似文献   

《Ecological modelling》2005,186(3):280-289
Increasing use is being made in conservation management of statistical models that couple extensive collections of species and environmental data to make predictions of the geographic distributions of species. While the relationships fitted between a species and its environment are relatively transparent for many of these modeling techniques, others are more ‘black box’ in character, only producing geographic predictions and providing minimal or untraditional summaries of the fitted relationships on which these predictions are based. This in turn prevents robust evaluation of the ecological sensibility of such models, a necessary process if model predictions are to be treated with confidence. Here we propose a new but simple method for visualizing modeled responses that can be implemented with any modeling method, and demonstrate its application using five common methods applied to the prediction of an Australian tree species. This is achieved by insetting an “evaluation strip” into the spatial data layers, which, after predictions have been made, can be clipped out and used for creating plots of the modelled responses. We present findings of the application strip for algorithms GLMs, GAMs, CLIM, DOMAIN and MARS. Evaluation strips can be constructed to investigate either uni-variate responses, or the simultaneous variation in predicted values in relation to two variables. The latter option is particularly useful for evaluating responses in models that allow the fitting of complex interaction terms.  相似文献   

Knowledge of the relationship between species traits and species distribution in fragmented landscapes is important for understanding current distribution patterns and as background information for predictive models of the effect of future landscape changes. The existing studies on the topic suffer from several drawbacks. First, they usually consider only traits related to dispersal ability and not growth. Furthermore, they do not apply phylogenetic corrections, and we thus do not know how considerations of phylogenetic relationships can alter the conclusions. Finally, they usually apply only one technique to calculate habitat isolation, and we do not know how other isolation measures would change the results. We studied the issues using 30 species forming congeneric pairs occurring in fragmented dry grasslands. We measured traits related to dispersal, survival, and growth in the species and recorded distribution of the species in 215 grassland fragments. We show many strong relationships between species traits related to both dispersal and growth and species distribution in the landscape, such as the positive relationship between habitat occupancy and anemochory and negative relationships between habitat occupancy and seed dormancy. The directions of these relationships, however, often change after application of phylogenetic correction. For example, more isolated habitats host species with smaller seeds. After phylogenetic correction, however, they turn out to host species with larger seeds. The conclusions also partly change depending on how we calculate habitat isolation. Specifically, habitat isolation calculated from occupied habitats only has the highest predictive power. This indicates slow dynamics of the species. All the results support the expectation that species traits have a high potential to explain patterns of species distribution in the landscape and that they can be used to build predictive models of species distribution. The specific conclusions are, however, dependent on the technique used, and we should carefully consider this when comparing among different studies. Since different techniques answer slightly different questions, we should attempt to use analyses both with and without phylogenetic correction and explore different isolation measures whenever possible and compare the results.  相似文献   

The proper management of an ecological population is greatly aided by solid information about its species' abundances. For the general heterogeneous Poisson species abundance setting, we develop the non-parametric mle for the entire probability model, namely for the total number N of species and the generating distribution F for the expected values of the species' abundances. Solid estimation of the entire probability model allows us to develop generator-based measures of ecological diversity and evenness which have inferences over similar regions. Also, our methods produce a solid goodness-of-fit test for our model as well as a likelihood ratio test to examine if there is heterogeneity in the expected values of the species' abundances. These estimates and tests are examined, in detail, in the paper. In particular, we apply our methods to important data from the National Breeding Bird Survey and discuss how our methods can also be easily applied to sweep net sampling data. To further examine our methods, we provide simulations for several illustrative situations.  相似文献   

Models of the geographic distributions of species have wide application in ecology. But the nonspatial, single-level, regression models that ecologists have often employed do not deal with problems of irregular sampling intensity or spatial dependence, and do not adequately quantify uncertainty. We show here how to build statistical models that can handle these features of spatial prediction and provide richer, more powerful inference about species niche relations, distributions, and the effects of human disturbance. We begin with a familiar generalized linear model and build in additional features, including spatial random effects and hierarchical levels. Since these models are fully specified statistical models, we show that it is possible to add complexity without sacrificing interpretability. This step-by-step approach, together with attached code that implements a simple, spatially explicit, regression model, is structured to facilitate self-teaching. All models are developed in a Bayesian framework. We assess the performance of the models by using them to predict the distributions of two plant species (Proteaceae) from South Africa's Cape Floristic Region. We demonstrate that making distribution models spatially explicit can be essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results. Adding hierarchical levels to the models has further advantages in allowing human transformation of the landscape to be taken into account, as well as additional features of the sampling process.  相似文献   

Ecological theory and current evidence support the validity of various species response curves according to a variety of environmental gradients. Various methods have been developed for building species distribution models but it is not well known how these methods perform under various assumptions about the form of the underlying species response. It is also not well known how spatial correlation in species occurrence affects model performance. These effects were investigated by applying an environmental envelope method (BIOCLIM) and three regression-based methods: logistic regression (LR), generalized additive modelling (GAM), and classification and regression tree (CART) to simulated species occurrence data. Each simulated species was constructed as a sum of responses with varying weights. Three basic species response curves were assumed: Gaussian (bell-shaped), Beta (skew) and linear. The two non-linear responses conform to standard ecological niche theory. All three responses were applied in turn to three simulated environmental variables, each with varying degrees of spatial autocorrelation. GAM produced the most consistent model performance over all forms of simulated species response. BIOCLIM and CART were inclined to underrate the performance of variables with a linear response. BIOCLIM was less sensitive to data density. LR was susceptible to model misspecification. The use of a linear function in LR underestimated the performance of variables with non-linear species response and contributed to increased spatial autocorrelation in model residuals. Omission of important environmental variables with non-linear species response also contributed to increased spatial autocorrelation in model residuals. Adding a spatial autocovariate term to the LR model (autologistic model) reduced the spatial autocorrelation and improved model performance, but did not correct the misidentification of the dominant environmental determinant. This is to be expected since the autologistic approach was designed primarily for prediction and not for inference. Given that various forms of species response to environmental determinants arise commonly in nature: (1) higher order functions should always be tested when applying LR in modelling species distribution; (2) spatial autocorrelation in species distribution model residuals can indicate that environmental determinants with non-linear response are missing from the model; and (3) deficiencies in LR model performance due to model misspecification can be addressed by adding a spatial autocovariate to the model, but care should be taken when interpreting the coefficients of the model parameters.  相似文献   

Coastal environments host plant taxa adapted to a wide range of salinity conditions. Salinity, along with other abiotic variables, constrains the distribution of coastal plants in predictable ways, with relatively few taxa adapted to the most saline conditions. However, few attempts have been made to quantify these relationships to create niche models for coastal plants. Quantification of the effects of salinity, and other abiotic variables, on coastal plants is essential to predict the responses of coastal ecosystems to external drivers such as sea level rise. We constructed niche models for 132 coastal plant taxa in Great Britain based on eight abiotic variables. Paired measurements of vegetation composition and abiotic variables are rare in coastal habitats so four of the variables were defined using community mean values for Ellenberg indicators, i.e. scores assigned according to the typical alkalinity, fertility, moisture availability and salinity of sites where a species occurs. The remaining variables were the canopy height, annual precipitation, and maximum and minimum temperatures. Salinity and moisture indicator scores were significant terms in over 80 % of models, suggesting the distributions of most coastal species are at least partly determined by these variables. When the models were used to predict species occurrence against an independent dataset 64 % of models gave moderate to good predictions of species occurrence. This indicates that most models had successfully captured the key determinants of the niche. The models could potentially be applied to predict changes to habitats and species-dependent ecosystem services in response to rising sea levels.  相似文献   

Ecological studies enable investigation of geographic variations in exposure to environmental variables, across groups, in relation to health outcomes measured on a geographic scale. Such studies are subject to ecological biases, including pure specification bias which arises when a nonlinear individual exposure-risk model is assumed to apply at the area level. Introduction of the within-area variance of exposure should induce a marked reduction in this source of ecological bias. Assuming several measurements per area of exposure and no confounding risk factors, we study the model including the within-area exposure variability when Gaussian within-area exposure distribution is assumed. The robustness is assessed when the within-area exposure distribution is misspecified. Two underlying exposure distributions are studied: the Gamma distribution and an unimodal mixture of two Gaussian distributions. In case of strong ecological association, this model can reduce the bias and improve the precision of the individual parameter estimates when the within-area exposure means and variances are correlated. These different models are applied to analyze the ecological association between radon concentration and childhood acute leukemia in France.
Léa FortunatoEmail:

