首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Random forests for classification in ecology   总被引:27,自引:0,他引:27  
Cutler DR  Edwards TC  Beard KH  Cutler A  Hess KT  Gibson J  Lawler JJ 《Ecology》2007,88(11):2783-2792
Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.  相似文献   

2.
Large, fine-grained samples are ideal for predictive species distribution models used for management purposes, but such datasets are not available for most species and conducting such surveys is costly. We attempted to overcome this obstacle by updating previously available coarse-grained logistic regression models with small fine-grained samples using a recalibration approach. Recalibration involves re-estimation of the intercept or slope of the linear predictor and may improve calibration (level of agreement between predicted and actual probabilities). If reliable estimates of occurrence likelihood are required (e.g., for species selection in ecological restoration) calibration should be preferred to other model performance measures. This updating approach is not expected to improve discrimination (the ability of the model to rank sites according to species suitability), because the rank order of predictions is not altered. We tested different updating methods and sample sizes with tree distribution data from Spain. Updated models were compared to models fitted using only fine-grained data (refitted models). Updated models performed reasonably well at fine scales and outperformed refitted models with small samples (10-100 occurrences). If a coarse-grained model is available (or could be easily developed) and fine-grained predictions are to be generated from a limited sample size, updating previous models may be a more accurate option than fitting a new model. Our results encourage further studies on model updating in other situations where species distribution models are used under different conditions from their training (e.g., different time periods, different regions).  相似文献   

3.
A variety of statistical techniques has been used in predictive vegetation modelling (PVM) that attempt to predict occurrence of a given community or species in respect to environmental conditions. We compared the performance of three profile models, BIOCLIM, GARP and MAXENT with three nonparametric models of group discrimination techniques, MARS, NPMR and LRT. The two latter models are relatively new statistical techniques that have just entered the field of PVM. We ran all models on a local scale for a given grassland community (Teucrio-Seslerietum) using the same input data to examine their performance. Model accuracy was evaluated both by Cohen’s kappa statistics (κ) and by area under receiver operating characteristics curve based both on resubstitution of training data and on an independent test data set. MAXENT of profile models and MARS of group discrimination techniques achieved the best prediction.  相似文献   

4.
Species distribution models (SDMs) based on statistical relationships between occurrence data and underlying environmental conditions are increasingly used to predict spatial patterns of biological invasions and prioritize locations for early detection and control of invasion outbreaks. However, invasive species distribution models (iSDMs) face special challenges because (i) they typically violate SDM's assumption that the organism is in equilibrium with its environment, and (ii) species absence data are often unavailable or believed to be too difficult to interpret. This often leads researchers to generate pseudo-absences for model training or utilize presence-only methods, and to confuse the distinction between predictions of potential vs. actual distribution. We examined the hypothesis that true-absence data, when accompanied by dispersal constraints, improve prediction accuracy and ecological understanding of iSDMs that aim to predict the actual distribution of biological invasions. We evaluated the impact of presence-only, true-absence and pseudo-absence data on model accuracy using an extensive dataset on the distribution of the invasive forest pathogen Phytophthora ramorum in California. Two traditional presence/absence models (generalized linear model and classification trees) and two alternative presence-only models (ecological niche factor analysis and maximum entropy) were developed based on 890 field plots of pathogen occurrence and several climatic, topographic, host vegetation and dispersal variables. The effects of all three possible types of occurrence data on model performance were evaluated with receiver operating characteristic (ROC) and omission/commission error rates. Results show that prediction of actual distribution was less accurate when we ignored true-absences and dispersal constraints. Presence-only models and models without dispersal information tended to over-predict the actual range of invasions. Models based on pseudo-absence data exhibited similar accuracies as presence-only models but produced spatially less feasible predictions. We suggest that true-absence data are a critical ingredient not only for accurate calibration but also for ecologically meaningful assessment of iSDMs that focus on predictions of actual distributions.  相似文献   

5.
Eradication and control of invasive species are often possible only if populations are detected when they are small and localized. To be efficient, detection surveys should be targeted at locations where there is the greatest risk of incursions. We examine the utility of habitat suitability index (HSI) and particle dispersion models for targeting sampling for marine pests. Habitat suitability index models are a simple way to identify suitable habitat when species distribution data are lacking. We compared the performance of HSI models with statistical models derived from independent data from New Zealand on the distribution of two nonindigenous bivalves: Theora lubrica and Musculista senhousia. Logistic regression models developed using the HSI scores as predictors of the presence/absence of Theora and Musculista explained 26.7% and 6.2% of the deviance in the data, respectively. Odds ratios for the HSI scores were greater than unity, indicating that they were genuine predictors of the presence/ absence of each species. The fit and predictive accuracy of each logistic model were improved when simulated patterns of dispersion from the nearest port were added as a predictor variable. Nevertheless, the combined model explained, at best, 46.5% of the deviance in the distribution of Theora and correctly predicted 56% of true presences and 50% of all cases. Omission errors were between 6% and 16%. Although statistical distribution models built directly from environmental predictors always outperformed the equivalent HSI models, the gain in model fit and accuracy was modest. High residual deviance in both types of model suggests that the distributions realized by Theora and Musculista in the field data were influenced by factors not explicitly modeled as explanatory variables and by error in the environmental data used to project suitable habitat for the species. Our results highlight the difficulty of accurately predicting the distribution of invasive marine species that exhibit low habitat occupancy and patchy distributions in time and space. Although the HSI and statistical models had utility as predictors of the likely distribution of nonindigenous marine species, the level of spatial accuracy achieved with them may be well below expectations for sensitive surveillance programs.  相似文献   

6.
Little is known on the factors controlling distribution and abundance of snow petrels in Antarctica. Studying habitat selection through modeling may provide useful information on the relationships between this species and its environment, especially relevant in a climate change context, where habitat availability may change. Validating the predictive capability of habitat selection models with independent data is a vital step in assessing the performance of such models and their potential for predicting species’ distribution in poorly documented areas.From the results of ground surveys conducted in the Casey region (2002–2003, Wilkes Land, East Antarctica), habitat selection models based on a dataset of 4000 nests were created to predict the nesting distribution of snow petrels as a function of topography and substrate. In this study, the Casey models were tested at Mawson, 3800 km away from Casey. The location and characteristics of approximately 7700 snow petrel nests were collected during ground surveys (Summer 2004–2005). Using GIS, predictive maps of nest distribution were produced for the Mawson region with the models derived from the Casey datasets and predictions were compared to the observed data. Models performance was assessed using classification matrixes and Receiver operating characteristic (ROC) curves. Overall correct classification rates for the Casey models varied from 57% to 90%. However, two geomorphologically different sub-regions (coastal islands and inland mountains) were clearly distinguished in terms of habitat selection by Casey model predictions but also by the specific variations in coefficients of terms in new models, derived from the Mawson data sets. Observed variations in the snow petrel aggregations were found to be related to local habitat availability.We discuss the applicability of various types of models (GLM, CT) and investigate the effect of scale on the prediction of snow petrel habitats. While the Casey models created with data collected at the nest scale did not perform well at Mawson due to regional variations in nest micro-characteristics, the predictive performance of models created with data compiled at a coarser scale (habitat units) was satisfactory. Substrate type was the most robust predictor of nest presence between Casey and Mawson. This study demonstrate that it is possible to predict at the large scale the presence of snow petrel nests based on simple predictors such as topography and substrate, which can be obtained from aerial photography. Such methodologies have valuable applications in the management and conservation of this top predator and associated resources and may be applied to other Antarctic, Sub-Antarctic and lower latitudes species and in a variety of habitats.  相似文献   

7.
In this study, we compared tree-growth rates (basal area increment) from recently dead and living Taurus fir (Abies cilicica Carr.) trees in the Kovada lake Forest of Isparta, Turkey. For each dead tree, tree-growth rates were analyzed for the presence of pre-death growth depressions in the study area (number of sample plots = 11) in 2006. However, we compared both the magnitude and rate of growth prior to death to a control (living) group of trees. Basal area increment (BAI) averaged substantially less during the last 10 years before death than for control trees. Trees that died started diverging in growth, on average, 50-60 years before death. About 18% of trees that died had chronically slow growth, 46% had pronounced declines in growth, whereas 36% had good growth up to death. However, tree-ring-based growth patterns of dead and living Taurus fir trees were compared and used 12 mortality models that were derived using logistic regression from growth patterns of tree-ring series as predictor variables. The four models with the highest overall performance correctly classified 43.8-56.3% of all dead trees and 75.0-87.5% of all living trees, and they predicted 25.0-43.8% of all dead trees to die within 0-15 years prior to the actual year of death.  相似文献   

8.
A statistical modeling study was performed on the population fluctuations of the 15 commonest fish species frequenting the tidal Scheldt estuary in Belgium. These included marine juvenile and seasonal visitors, estuarine residents and diadromous fish species that were recorded on the filter screens of a power plant cooling-water intake between September 1991 and April 2001. The species population abundance was regressed against a candidate set of 6 environmental variables and 13 instrumental variables, accounting for seasonality and long-term trends present in the data. Population abundances of the different species were, in general, best described by seasonal variables. Seasonal components contributed, on average, up to 63.8% of the variance explained by the models. Ten species were found to show a slightly negative, though significant, trend over the period of the survey. Most models also included at least one environmental variable, and 25.4% of the explained variance could be attributed to environmental fluctuations. Of all physico-chemical variables, dissolved oxygen was the most important predictor of fish abundance, suggesting that the estuary suffered from poor water quality during the survey. Temperature, salinity, freshwater flow, suspended solids and chlorophyll a concentrations were minor determinants of fish abundance.Communicated by O. Kinne, Oldendorf/Luhe  相似文献   

9.
Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses, respectively) are the tools of choice in many of these applications. These tools are widely used in large remote sensing applications, but are not easily interpretable, do not have ties with survey estimation methods, and use proprietary unpublished algorithms. Consequently, three alternative modelling techniques were compared for mapping presence and basal area of 13 species located in the mountain ranges of Utah, USA. The modelling techniques compared included the widely used See5/Cubist, generalized additive models (GAMs), and stochastic gradient boosting (SGB). Model performance was evaluated using independent test data sets. Evaluation criteria for mapping species presence included specificity, sensitivity, Kappa, and area under the curve (AUC). Evaluation criteria for the continuous basal area variables included correlation and relative mean squared error. For predicting species presence (setting thresholds to maximize Kappa), SGB had higher values for the majority of the species for specificity and Kappa, while GAMs had higher values for the majority of the species for sensitivity. In evaluating resultant AUC values, GAM and/or SGB models had significantly better results than the See5 models where significant differences could be detected between models. For nine out of 13 species, basal area prediction results for all modelling techniques were poor (correlations less than 0.5 and relative mean squared errors greater than 0.8), but SGB provided the most stable predictions in these instances. SGB and Cubist performed equally well for modelling basal area for three species with moderate prediction success, while all three modelling tools produced comparably good predictions (correlation of 0.68 and relative mean squared error of 0.56) for one species.  相似文献   

10.
In this study, diameter growth models for three species growing in mixed-stands of Coastal British Columbia (BC), Canada, under a variety of silvicultural treatments were developed. The three species were: Douglas-fir (Pseudotsuga menziesii var. menziesii (Mirb.) Franco), western hemlock (Tsuga heterophylla (Raf.) Sarg.), and western redcedar (Thuja plicata Donn). A Box and Lucas model (1959) was initially fitted to the diameter growth series for each tree, as this model is very flexible and was based on processes reflective of the metabolic processes governing tree growth. Next, a random coefficients modelling approach (i.e., parameter prediction approach) was used to modify the estimated parameters for each species using functions of tree size and stage of development, site productivity, and inter-tree competition variables, while accounting for temporal correlation within trees. Impacts of fertilization on diameter growth were estimated by including the time since fertilization as an additional variable. Since state variables that are changed as a result of thinning were already in the model, accurate results post-thinning were obtained with no changes to the model. For the combined effects of thinning and fertilization, a two-step additive approach was used, where the state variables were changed following thinning and the diameter increment was modified for fertilization using the time since fertilization variable. Results indicated that multiple treatments sustain a change in growth for a longer time period following treatment than thinning or fertilization alone.  相似文献   

11.
《Ecological modelling》2004,175(2):137-149
Bird species are selective on the vegetation types in which they are found but predictive models of bird distribution based on variables derived from land-use/land-cover maps tend to have limited success. It has been suggested that accuracy of existing maps used to derive predictors is in part responsible for the limited success of bird distribution models. In two areas of 4900 km2 of Western Andalusia, Spain, we compared the predictive ability of bird distribution models derived from two existing general-purpose land-use/land-cover maps, which differ in their resolution and accuracy: a coarse scale vegetation map of Europe, the CORINE land-cover map, and a detailed regional map, the 1995 land-use/land-cover map of Andalusia from the SINAMBA (Consejerı́a de Medio Ambiente, Junta de Andalucı́a). We compared the bird distribution models derived from these general-purpose vegetation maps with models derived from two more accurate structural vegetation maps built considering directly variables that influence bird habitat selection, one built from satellite images for this study and another obtained by improving the resolution and accuracy of the SINAMBA map with satellite data. We sampled the presence/absence of bird species at 857 points using 15-min point surveys. Predictive models for 54 bird species were built with generalised additive models (GAMs), using as potential predictors the same set of landscape and vegetation structure variables measured on each map. We compared for each bird species the predictive accuracy of the best model derived from each map. Vegetation structure measured at bird sample points was used as ground-truth for comparing the accuracy of vegetation maps. Although maps differed in their resolution and accuracy, the results show that all of them produced similarly accurate bird distribution models, with a mixed map produced with both thematic and satellite information being the best. The models derived from the more accurate vegetation structure maps obtained from satellite data were not more accurate than those derived directly from the SINAMBA or CORINE maps. Our results suggest that some general-purpose land-use/land-cover maps are accurate enough to derive bird distribution models. There is a certain limit to improve vegetation maps above which there is no effect in their power to predict bird distribution.  相似文献   

12.
Although long-lived tree species experience considerable environmental variation over their life spans, their geographical distributions reflect sensitivity mainly to mean monthly climatic conditions. We introduce an approach that incorporates a physiologically based growth model to illustrate how a half-dozen tree species differ in their responses to monthly variation in four climatic-related variables: water availability, deviations from an optimum temperature, atmospheric humidity deficits, and the frequency of frost. Rather than use climatic data directly to correlate with a species’ distribution, we assess the relative constraints of each of the four variables as they affect predicted monthly photosynthesis for Douglas-fir, the most widely distributed species in the region. We apply an automated regression-tree analysis to create a suite of rules, which differentially rank the relative importance of the four climatic modifiers for each species, and provide a basis for predicting a species’ presence or absence on 3737 uniformly distributed U.S. Forest Services’ Forest Inventory and Analysis (FIA) field survey plots. Results of this generalized rule-based approach were encouraging, with weighted accuracy, which combines the correct prediction of both presence and absence on FIA survey plots, averaging 87%. A wider sampling of climatic conditions throughout the full range of a species’ distribution should improve the basis for creating rules and the possibility of predicting future shifts in the geographic distribution of species.  相似文献   

13.
Gradient forests: calculating importance gradients on physical predictors   总被引:2,自引:0,他引:2  
Ellis N  Smith SJ  Pitcher CR 《Ecology》2012,93(1):156-168
In ecological analyses of species and community distributions there is interest in the nature of their responses to environmental gradients and in identifying the most important environmental variables, which may be used for predicting patterns of biodiversity. Methods such as random forests already exist to assess predictor importance for individual species and to indicate where along gradients abundance changes. However, there is a need to extend these methods to whole assemblages, to establish where along the range of these gradients the important compositional changes occur, and to identify any important thresholds or change points. We develop such a method, called "gradient forest," which is an extension of the random forest approach. By synthesizing the cross-validated R2 and accuracy importance measures from univariate random forest analyses across multiple species, sampling devices, and surveys, gradient forest obtains a monotonic function of each predictor that represents the compositional turnover along the gradient of the predictor. When applied to a synthetic data set, the method correctly identified the important predictors and delineated where the compositional change points occurred along these gradients. Application of gradient forest to a real data set from part of the Great Barrier Reef identified mud fraction of the sediment as the most important predictor, with highest compositional turnover occurring at mud fraction values around 25%, and provided similar information for other predictors. Such refined information allows for more accurate capturing of biodiversity patterns for the purposes of bioregionalization, delineation of protected areas, or designing of biodiversity surveys.  相似文献   

14.
The aim of the study is the estimation of decay rates for coarse woody debris in large forest regions. These rates, together with estimations of the amount of deadwood, can be used to calculate the release of carbon from that pool into the atmosphere. The model can be used for predictions of decomposition rate constants in a wide range of forest areas (e.g. in process based ecological models, reporting of GHG-emissions), as only easily available predictor variables were used in the regression.Based on an intensive literature research a meta-analysis on influencing factors controlling the constant decay rate of coarse woody debris was set up. The included studies differed significantly in the survey methods as well as in the geographical origin. 39 studies were collected, 30 appeared in North America and nine in Europe. Based on these studies 291 observations of the remaining fraction of coarse woody debris were collected.To quantify the effects that influence the decomposition rates a nonlinear mixed effects model was constructed. Only physiologically interpretable variables were included. With this approach it was possible to determine influencing effects from mean temperature in July, annual rainfall (as quadratic term), diameter of woody material and grouping into hardwoods or conifers and mass- or density loss were significant variables. The mixed effects model also allowed an estimation of the species-specific effects on the decomposition process. These random effects are given for 42 tree species. The degrees of freedom were used efficiently. The model explains 79.6% of the variance and is superior to a comparable multiple regression model.  相似文献   

15.
Empirical models for predicting the distribution of organisms from environmental data have often focused on principles of ecological niche theory. However, even at large scales, there is little agreement over how to represent the dimensions of a species’ niche. The performance of such models is greatly affected by the nature of species distributional and environmental data. Regional scale distribution models were developed for 30 willow species in Ontario to examine (i) the predictive ability of logistic regression analysis, and (ii) the effects of using different distributional and environmental data sets. Two original measures of model accuracy and over-prediction were employed and evaluated using independent data. Models based on unique combinations of monthly climate data predicted distributions most accurately for all species. Models based on a fixed set of variables, while generating the highest average probabilities of occurrence for certain species with limited ranges, resulted in the greatest under- and over-estimates of willow distributions. Comparisons of models demonstrated climatic patterns among willows of differing habit and habitat. The distribution of dwarf willow species, present only in the Ontario arctic, followed gradients of summer maximum temperatures. The distribution of the tree species in the southerly portions of the province followed gradients of fall and winter minimum temperatures. Regardless of distributional and environmental data input, no algorithm maximized model performance for all species. Individual species models require individual approaches; i.e., the variable selection technique, the set of environmental factors used as predictors, and the nature of species distributional data must be carefully matched to the intended application. An understanding of evolutionary processes enhances the meaningful interpretation of individual species models. Unless sampling bias and species prevalence can be accounted for, models based on collection point data are best used to guide field surveys. While inferred range data may be better suited to determine potential ecological niches, overestimation of species prevalence and environmental tolerance must be recognized. A combination of available distributional data types is recommended to best determine species niches, an important step in developing conservation strategies.  相似文献   

16.
To make a macrofaunal (crustacean) habitat potential map, the spatial distribution of ecological variables in the Hwangdo tidal flat, Korea, was explored. Spatial variables were mapped using remote sensing and a geographic information system (GIS) combined with field observations. A frequency ratio (FR) and logistic regression (LR) model were employed to map the macrofauna potential area for the Ilyoplax dentimerosa, a crustacean species. Spatial variables affecting the tidal macrofauna distribution were selected based on abundance and biomass and used within a spatial database derived from remotely sensed data of various types of sensors. The spatial variables included the intertidal digital elevation model (DEM), slope, distance from a tidal channel, tidal channel density, surface sediment facies, spectral reflectance of the near infrared (NIR) bands and the tidal exposure duration. The relation between the I. dentimerosa and each spatial variable was calculated using the FR and LR. The species was randomly divided into a training set (70%) to analyse habitat potential using FR and LR and a test set (30%) to validate the predicted habitat potential map. The relations were overlaid to produce a habitat potential map with the species potential index (SPI) value for each pixel. The potential habitat maps were compared with the surveyed habitat locations such as validation data set. The comparison results showed that the LR model (accuracy is 85.28%) is better in prediction than the FR (accuracy is 78.96%) model. The performance of models gave satisfactory accuracies. The LR provides the quantitative influence of variables on a potential habitat of species; otherwise, the FR shows the quantitative influence of a class in each variable. The combination of a GIS-based frequency ratio and logistic regression models and remote sensing with field observations is an effective method to determine locations favorable for macrofaunal species occurrences in a tidal flat.  相似文献   

17.
An important decision in presence-only species distribution modeling is how to select background (or pseudo-absence) localities for model parameterization. The selection of such localities may influence model parameterization and thus, can influence the appropriateness and accuracy of the model prediction when extrapolating the species distribution across time and space. We used 12 species from the Australian Wet Tropics (AWT) to evaluate the relationship between the geographic extent from which pseudo-absences are taken and model performance, and shape and importance of predictor variables using the MAXENT modeling method. Model performance is lower when pseudo-absence points are taken from either a restricted or broad region with respect to species occurrence data than from an intermediate region. Furthermore, variable importance (i.e., contribution to the model) changed such that, models became increasingly simplified, dominated by just two variables, as the area from which pseudo-absence points were drawn increased. Our results suggest that it is important to consider the spatial extent from which pseudo-absence data are taken. We suggest species distribution modeling exercises should begin with exploratory analyses evaluating what extent might provide both the most accurate results and biologically meaningful fit between species occurrence and predictor variables. This is especially important when modeling across space or time—a growing application for species distributional modeling.  相似文献   

18.
The effect of digital elevation model (DEM) error on environmental variables, and subsequently on predictive habitat models, has not been explored. Based on an error analysis of a DEM, multiple error realizations of the DEM were created and used to develop both direct and indirect environmental variables for input to predictive habitat models. The study explores the effects of DEM error and the resultant uncertainty of results on typical steps in the modeling procedure for prediction of vegetation species presence/absence. Results indicate that all of these steps and results, including the statistical significance of environmental variables, shapes of species response curves in generalized additive models (GAMs), stepwise model selection, coefficients and standard errors for generalized linear models (GLMs), prediction accuracy (Cohen's kappa and AUC), and spatial extent of predictions, were greatly affected by this type of error. Error in the DEM can affect the reliability of interpretations of model results and level of accuracy in predictions, as well as the spatial extent of the predictions. We suggest that the sensitivity of DEM-derived environmental variables to error in the DEM should be considered before including them in the modeling processes.  相似文献   

19.
生物降解性是评估污染物环境持久性的重要依据,也是化学品是否获准生产及进入市场的评价指标。采用17位生物降解领域专家评估的生物降解等级数据,通过功能树(FT)算法建立了包含15个分子结构参数的初级生物降解和最终生物降解预测模型。外部验证结果表明,模型具有较好的预测准确性,初级生物降解性加权准确度(weighted accuracy,WA):训练集WA=84.1%,验证集WA=78.9%;最终生物降解性WA:训练集WA=91.0%;验证集WA=83.6%。预测正确性对化合物的杠杆值作图,表征了生物降解性模型的应用域。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号