首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose a novel tool for testing hypotheses concerning the adequacy of environmentally defined factors for local clustering of diseases, through the comparative evaluation of the significance of the most likely clusters detected under maps whose neighborhood structures were modified according to those factors. A multi-objective genetic algorithm scan statistic is employed for finding spatial clusters in a map divided in a finite number of regions, whose adjacency is defined by a graph structure. This cluster finder maximizes two objectives, the spatial scan statistic and the regularity of cluster shape. Instead of specifying locations for the possible clusters a priori, as is currently done for cluster finders based on focused algorithms, we alter the usual adjacency induced by the common geographical boundary between regions. In our approach, the connectivity between regions is reinforced or weakened, according to certain environmental features of interest associated with the map. We build various plausible scenarios, each time modifying the adjacency structure on specific geographic areas in the map, and run the multi-objective genetic algorithm for selecting the best cluster solutions for each one of the selected scenarios. The statistical significances of the most likely clusters are estimated through Monte Carlo simulations. The clusters with the lowest estimated p-values, along with their corresponding maps of enhanced environmental features, are displayed for comparative analysis. Therefore the probability of cluster detection is increased or decreased, according to changes made in the adjacency graph structure, related to the selection of environmental features. The eventual identification of the specific environmental conditions which induce the most significant clusters enables the practitioner to accept or reject different hypotheses concerning the relevance of geographical factors. Numerical simulation studies and an application for malaria clusters in Brazil are presented.  相似文献   

2.
The geographic delineation of irregularly shaped spatial clusters is an ill defined problem. Whenever the spatial scan statistic is used, some kind of penalty correction needs to be used to avoid clusters’ excessive irregularity and consequent reduction of power of detection. Geometric compactness and non-connectivity regularity functions have been recently proposed as corrections. We present a novel internal cohesion regularity function based on the graph topology to penalize the presence of weak links in candidate clusters. Weak links are defined as relatively unpopulated regions within a cluster, such that their removal disconnects it. By applying this weak link cohesion function, the most geographically meaningful clusters are sifted through the immense set of possible irregularly shaped candidate cluster solutions. A multi-objective genetic algorithm (MGA) has been proposed recently to compute the Pareto-sets of clusters solutions, employing Kulldorff’s spatial scan statistic and the geometric correction as objective functions. We propose novel MGAs to maximize the spatial scan, the cohesion function and the geometric function, or combinations of these functions. Numerical tests show that our proposed MGAs has high power to detect elongated clusters, and present good sensitivity and positive predictive value. The statistical significance of the clusters in the Pareto-set are estimated through Monte Carlo simulations. Our method distinguishes clearly those geographically inadequate clusters which are worse from both geometric and internal cohesion viewpoints. Besides, a certain degree of irregularity of shape is allowed provided that it does not impact internal cohesion. Our method has better power of detection for clusters satisfying those requirements. We propose a more robust definition of spatial cluster using these concepts.  相似文献   

3.
The scan statistic is widely used in spatial cluster detection applications of inhomogeneous Poisson processes. However, real data may present substantial departure from the underlying Poisson process. One of the possible departures has to do with zero excess. Some studies point out that when applied to data with excess zeros, the spatial scan statistic may produce biased inferences. In this work, we develop a closed-form scan statistic for cluster detection of spatial zero-inflated count data. We apply our methodology to simulated and real data. Our simulations revealed that the Scan-Poisson statistic steadily deteriorates as the number of zeros increases, producing biased inferences. On the other hand, our proposed Scan-ZIP and Scan-ZIP+EM statistics are, most of the time, either superior or comparable to the Scan-Poisson statistic.  相似文献   

4.
5.
6.
Whether general environmental exposures to endocrine disrupting chemicals (including pesticides and dioxin) might induce decreased sex ratios (male/female ratio at birth) is discussed. To address this issue, the authors looked for a space-time clustering test which could detect local areas of significantly low risk, assuming a Bernoulli distribution. As a matter of fact, if the endocrine disruptor hypothesis holds true, and if the sex ratio is a sentinel health event indicative of new reproductive hazards ascribed to environmental factors, then in a given region, either a cluster of low male/female ratio among newborn babies would be expected in the vicinity of polluting municipal solid waste incinerators (MSWIs) (supporting the dioxin hypothesis), or local clusters would be expected in some rural areas where large amounts of pesticides are sprayed. Among cluster detection tests, the spatial scan statistic has been widely used in various applications to scan for areas with high rates, and rarely (if ever) with low rates. Therefore, the goal of this paper was to check the properties of the scan statistics under a given scenario (Bernoulli distribution, search for clusters with low rates) and to assess its added value in addressing the sex ratio issue. This study took place in the Franche-Comté region (France), mainly rural, comprising three main MSWIs, among which only one had high dioxin emissions level in the past. The study population consisted of 192,490 boys and 182,588 girls born during the 1975–1999 period. On the whole, the authors conclude that: (i) spatial and space-time scan statistics provide attractive features to address the sex ratio issue; (ii) sex ratio is not markedly affected across space and does not provide a reliable screening measure for detecting reproductive hazards ascribed to environmental factors.  相似文献   

7.
Upper level set scan statistic for detecting arbitrarily shaped hotspots   总被引:2,自引:0,他引:2  
A declared need is around for geoinformatic surveillance statistical science and software infrastructure for spatial and spatiotemporal hotspot detection. Hotspot means something unusual, anomaly, aberration, outbreak, elevated cluster, critical resource area, etc. The declared need may be for monitoring, etiology, management, or early warning. The responsible factors may be natural, accidental, or intentional. This proof-of-concept paper suggests methods and tools for hotspot detection across geographic regions and across networks. The investigation proposes development of statistical methods and tools that have immediate potential for use in critical societal areas, such as public health and disease surveillance, ecosystem health, water resources and water services, transportation networks, persistent poverty typologies and trajectories, environmental justice, biosurveillance and biosecurity, among others. We introduce, for multidisciplinary use, an innovation of the health-area-popular circle-based spatial and spatiotemporal scan statistic. Our innovation employs the notion of an upper level set, and is accordingly called the upper level set scan statistic, pointing to a sophisticated analytical and computational system as the next generation of the present day popular SaTScan. Success of surveillance rests on potential elevated cluster detection capability. But the clusters can be of any shape, and cannot be captured only by circles. This is likely to give more of false alarms and more of false sense of security. What we need is capability to detect arbitrarily shaped clusters. The proposed upper level set scan statistic innovation is expected to fill this need  相似文献   

8.
Indoor radon is an important risk factor for human health. Indeed radon inhalation is considered the second cause of lung cancer after smoking. During the last decades, in many countries huge efforts have been made in order to measuring, mapping and predicting radon levels in dwellings. Various researches have been devoted to identify those areas within the country where high radon concentrations are more likely to be found. Data collected through indoor radon surveys have been analysed adopting various statistical approaches, among which hierarchical Bayesian models and geostatistical tools are worth noting. The essential goal of this paper regards the identification of high radon concentration areas (the so-called radon prone areas) in the Abruzzo Region (Italy). In order to accurately pinpoint zones deserving attention for mitigation purpose, we adopt spatial cluster detection techniques, traditionally employed in epidemiology. As a first step, we assume that indoor radon measurements do not arise from a continuous spatial process; thus the geographic locations of dwellings where the radon measurements have been taken can be viewed as a realization of a spatial point process. Following this perspective, we adopt and compare recent cluster detection techniques: the simulated annealing scan statistic, the case event approach based on distance regression on the selection order and the elliptic spatial scan statistic. The analysis includes data collected during surveys carried out by the Regional Agency for the Environment Protection of Abruzzo (ARTA) in 1,861 random sampled dwellings across 277 municipalities of the Abruzzo region. The radon prone areas detected by the selected approaches are provided along with the summary statistics of the methods. Finally, the methodologies considered in this paper are tested on simulated data in order to evaluate their power and the precision of cluster location detection.  相似文献   

9.
Geographical surveillance for hotspot detection and delineation has become an important area of investigation both in geospatial ecosystem health and in geospatial public health. In order to find critical areas based on synoptic cellular data, geospatial ecosystem health investigations apply recently discovered echelon tools. In order to find elevated rate areas based on synoptic cellular data, geospatial public health investigations apply recently discovered spatial scan statistic tools. The purpose of this paper is to conceptualize a joint role for these together in the spirit of a cross-disciplinary cross-fertilization to accomplish more effective and efficient geographical surveillance for hotspot detection and delineation, and early warning system.  相似文献   

10.
This paper extends the spatial local-likelihood model and the spatial mixture model to the space-time (ST) domain. For comparison, a standard random effect space-time (SREST) model is examined to allow evaluation of each model’s ability in relation to cluster detection. To pursue this evaluation, we use the ST counterparts of spatial cluster detection diagnostics. The proposed criteria are based on posterior estimates (e.g., misclassification rate) and some are based on post-hoc analysis of posterior samples (e.g., exceedance probability). In addition, we examine more conventional model fit criteria including mean square error (MSE). We illustrate the methodology with a real ST dataset, Georgia throat cancer mortality data for the years 1994–2005, and a simulated dataset where different levels and shapes of clusters are embedded. Overall, it is found that conventional SREST models fair well in ST cluster detection and in goodness-of-fit, while for extreme risk detection the local likelihood ST model does best.  相似文献   

11.
Recent years have witnessed the growth of new information technologies and their applications to various disciplines. The goal of this paper is to demonstrate how the two innovative methods, upper level set scan (ULS) hotspot detection and the multicriteria prioritization scheme, facilitate population health and break new ground in public health surveillance. It is believed that the social environment (i.e. social conditions and social capital) is one of the determinants of human health. Using infant health data and 10 additional indicators of social environment in the 159 counties of Georgia, ULS identified 52 counties that are in double jeopardy (high infant mortality and a high rate of low infant birth weight). The multicriteria ranking scheme suggested that there was no conspicuous spatial cluster of ranking orders, which improved the traditional decision making by visual geographic cluster. Both hotspot detection and ranking methods provided an empirical basis for re-allocating limited resources and several policy implications could be drawn from these analytic results.  相似文献   

12.
Routine surveillance of a large geographic region for clusters of adverse health events, particularly cancers, often involves small area health data, possibly controlling for exposure information. Many different methods have been proposed to test for the presence of geographical clusters. Two of the most popular methods are the spatial scan method proposed by Kulldorff and that using a fixed number of cases within scanning circles proposed by Besag and Newell. Although the second test is very popular, it has some difficulties. While the scan test controls for the multiple testing problem, the Besag and Newell test does not. Additionally, the latter method requires the setting of several tuning parameters whose values affect the test performance and are subjectively chosen by the user. This creates a difficulty to make a fair comparison between the two methods and it explains why there have been few formal studies evaluating their relative performances. In this paper, we modify the Besag and Newell test allowing for the control of the error type I probability and compare its power with respect to that of the spatial scan test. We used data sets from a publicly available simulated benchmark. We found that the two methods have similar results, except for clusters located in sparsely populated regions, where the spatial scan method presented a better performance.  相似文献   

13.
The purpose of this paper is to develop a set of associated statistical tests for spatial clustering. In particular, a set of three associated tests will be developed; these will correspond to the three types of tests set out by Besag and Newell (general tests, focused tests, and tests for the detection of clustering). The associated tests draw primarily, though not exclusively, upon existing tests and results. The principal contributions are based upon the score statistic for focused tests, which has been an important approach to testing for clustering around environmental hazards. The first contribution consists of the formulation of a global statistic for general tests that corresponds to focused score statistics, along with an assessment of the distribution of the statistic under the null hypothesis of no raised incidence. The local score statistics used for focused tests will have the property of summing to the global statistic used for the corresponding general test. Attention is also given to the maximum local score statistic for the “test for the detection of clustering”. The critical values of this statistic which are required for testing the null hypothesis are described. Application of the methods is made to leukemia data for central New York State.  相似文献   

14.
Many statistical tests have been developed to assess the significance of clusters of disease located around known sources of environmental contaminants, also known as focused disease clusters. The majority of focused-cluster tests were designed to detect a particular spatial pattern of clustering, one in which the disease cluster centers around the pollution source and declines in a radial fashion with distance. However, other spatial patterns of environmentally related disease clusters are likely given that the spatial dispersion patterns of environmental contaminants, and thus human exposure, depend on a number of factors (i.e., meteorology and topography). For this study, data were simulated with five different spatial patterns of disease clusters, reflecting potential pollutant dispersion scenarios: (1) a radial effect decreasing with increasing distance, (2) a radial effect with a defined peak and decreasing with distance, (3) a simple angular effect, (4) an angular effect decreasing with increasing distance and (5) an angular effect with a defined peak and decreasing with distance. The power to detect each type of spatially distributed disease cluster was evaluated using Stone’s Maximum Likelihood Ratio Test, Tango’s Focused Test, Bithell’s Linear Risk Score Test, and variations of the Lawson–Waller Score Test. Study findings underscore the importance of considering environmental contaminant dispersion patterns, particularly directional effects, with respect to focused-cluster test selection in cluster investigations. The effect of extra variation in risk also is considered, although its effect is not substantial in terms of the power of tests.  相似文献   

15.
Boundary analysis of cancer maps may highlight areas where causative exposures change through geographic space, the presence of local populations with distinct cancer incidences, or the impact of different cancer control methods. Too often, such analysis ignores the spatial pattern of incidence or mortality rates and overlooks the fact that rates computed from sparsely populated geographic entities can be very unreliable. This paper proposes a new methodology that accounts for the uncertainty and spatial correlation of rate data in the detection of significant edges between adjacent entities or polygons. Poisson kriging is first used to estimate the risk value and the associated standard error within each polygon, accounting for the population size and the risk semivariogram computed from raw rates. The boundary statistic is then defined as half the absolute difference between kriged risks. Its reference distribution, under the null hypothesis of no boundary, is derived through the generation of multiple realizations of the spatial distribution of cancer risk values. This paper presents three types of neutral models generated using methods of increasing complexity: the common random shuffle of estimated risk values, a spatial re-ordering of these risks, or p-field simulation that accounts for the population size within each polygon. The approach is illustrated using age-adjusted pancreatic cancer mortality rates for white females in 295 US counties of the Northeast (1970–1994). Simulation studies demonstrate that Poisson kriging yields more accurate estimates of the cancer risk and how its value changes between polygons (i.e., boundary statistic), relatively to the use of raw rates or local empirical Bayes smoother. When used in conjunction with spatial neutral models generated by p-field simulation, the boundary analysis based on Poisson kriging estimates minimizes the proportion of type I errors (i.e., edges wrongly declared significant) while the frequency of these errors is predicted well by the p-value of the statistical test.
Pierre GoovaertsEmail:
  相似文献   

16.
Air–water flows at hydraulic structures are commonly observed and called white waters. The free-surface aeration is characterised by some intense exchanges of air and water leading to complex air–water structures including some clustering. The number and properties of clusters may provide some measure of the level of particle-turbulence and particle–particle interactions in the high-velocity air–water flows. Herein a re-analysis of air–water clusters was applied to a highly aerated free-surface flow data set (Chanson and Carosi, Exp Fluids 42:385–401, 2007). A two-dimensional cluster analysis was introduced combining a longitudinal clustering criterion based on near-wake effect and a side-by-side particle detection method. The results highlighted a significant number of clustered particles in the high-velocity free-surface flows. The number of bubble/droplet clusters per second and the percentage of clustered particles were significantly larger using the two-dimensional cluster analysis than those derived from earlier longitudinal detection techniques only. A number of large cluster structures were further detected. The results illustrated the complex interactions between entrained air and turbulent structures in skimming flow on a stepped spillway, and the cluster detection method may apply to other highly aerated free-surface flows.  相似文献   

17.
This paper presents a scan statistic, progressive upper level set (PULSE) scan statistic, for geospatial hotspot detection and its software implementation. Like ULS, the PULSE scan statistic is based on the arbitrarily shaped scan window and can be adapted for a network setting. PULSE is a refinement of the upper level set (ULS) scan statistic. Like some other likelihood based scanning devices, the ULS scan statistic identifies maximum likelihood estimate (MLE) zones that tend to be ‘stringy’ and sprawling. Its search path increases possibility of inclusion of extraneous cells in its MLE zones and, to a smaller extent, of exclusion of cells that belong to a true hotspot from its MLE zone. The PULSE scan statistic achieves improvement over the ULS scan statistic in two ways. First, it begins its search for a most likely zone with a large population of candidate zones obtained by modifying the ULS tree structure and continues its search using a genetic algorithm. Secondly, to reduce chances of generating an MLE that is excessively stringy and that includes extraneous cells in the MLE zone, PULSE uses cardinality and compactness of zones along with their likelihoods as the fitness function in the genetic algorithm and uses several pertinent criteria including evenness of intra-zone cellular response ratios to determine the MLE zone. To reduce computation, Gumbel distribution of extreme values is used to determine the p-value of the MLE zone. Better results come at the cost of increased processing time. An evaluative performance study is presented.  相似文献   

18.
We formulate and simulation-test a spatial surplus production model that provides a basis with which to undertake multispecies, multi-area, stock assessment. Movement between areas is parameterized using a simple gravity model that includes a "residency" parameter that determines the degree of stock mixing among areas. The model is deliberately simple in order to (1) accommodate nontarget species that typically have fewer available data and (2) minimize computational demand to enable simulation evaluation of spatial management strategies. Using this model, we demonstrate that careful consideration of spatial catch and effort data can provide the basis for simple yet reliable spatial stock assessments. If simple spatial dynamics can be assumed, tagging data are not required to reliably estimate spatial distribution and movement. When applied to eight stocks of Atlantic tuna and billfish, the model tracks regional catch data relatively well by approximating local depletions and exchange among high-abundance areas. We use these results to investigate and discuss the implications of using spatially aggregated stock assessment for fisheries in which the distribution of both the population and fishing vary over time.  相似文献   

19.
To predict macrofaunal community composition from environmental data a two-step approach is often followed: (1) the water samples are clustered into groups on the basis of the macrofauna data and (2) the groups are related to the environmental data, e.g. by discriminant analysis. For the cluster analysis in step 1 many hard, seemingly arbitrary choices have to be made that nevertheless influence the solution (similarity measure, clustering strategy, number of clusters). The stability of the solution is often of concern, e.g. in clustering by the program. In the discriminant analysis of step 2 it can occur that a water sample is misclassified on the basis of the environmental data but on further inspection happens to be a borderline case in the cluster analysis. One would then rather reclassify such a sample and iterate the two steps. Bayesian latent class analysis is a flexible, extendable model-based cluster analysis approach that recently has gained popularity in the statistical literature and that has the potential to address these problems. It allows the macrofauna and environmental data to be modelled and analyzed in a single integrated analysis. An exciting extension is to incorporate in the analysis prior information on the habitat preferences of the macrofauna taxa such as is available in lists of indicator values. The output of the analysis is not a hard assignment of water samples to clusters but a probabilistic (fuzzy) assignment. The number of clusters is determined on the basis of the Bayes factor. A standard feature of the Bayesian method is to make predictions and to assess their uncertainty. We applied this approach to a data set consisting of 70 water samples, 484 macrofauna taxa and four environmental variables for which previously a five cluster solution had been proposed. The standard for Bayesian estimation, the Gibbs sampler, worked fine on a subset with only 12 selected taxa but did not converge on the full set with 484 taxa. This is due to many configurations in which the assignment probabilities are all very close to either 0 or 1. This convergence problem is comparable with the local optima problem in classical cluster optimization algorithms, including the EM algorithm used in Latent Gold, a Windows program for latent class analysis. The convergence problem needs to be solved before the benefits of Bayesian latent class analysis can come to fruition in this application. We discuss possible solutions.  相似文献   

20.
The statistical analysis of environmental data from remote sensing and Earth system simulations often entails the analysis of gridded spatio-temporal data, with a hypothesis test being performed for each grid cell. When the whole image or a set of grid cells are analyzed for a global effect, the problem of multiple testing arises. When no global effect is present, we expect $$ \alpha $$% of all grid cells to be false positives, and spatially autocorrelated data can give rise to clustered spurious rejections that can be misleading in an analysis of spatial patterns. In this work, we review standard solutions for the multiple testing problem and apply them to spatio-temporal environmental data. These solutions are independent of the test statistic, and any test statistic can be used (e.g., tests for trends or change points in time series). Additionally, we introduce permutation methods and show that they have more statistical power. Real-world data are used to provide examples of the analysis, and the performance of each method is assessed in a simulation study. Unlike other simulation studies, our study compares the statistical power of the presented methods in a comprehensive simulation study. In conclusion, we present several statistically rigorous methods for analyzing spatio-temporal environmental data and controlling the false positives. These methods allow the use of any test statistic in a wide range of applications in environmental sciences and remote sensing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号