共查询到20条相似文献,搜索用时 31 毫秒
1.
Luiz Duczmal Ricardo Tavares Ganapati Patil André L. F. Cançado 《Environmental and Ecological Statistics》2010,17(2):183-202
We propose a novel tool for testing hypotheses concerning the adequacy of environmentally defined factors for local clustering
of diseases, through the comparative evaluation of the significance of the most likely clusters detected under maps whose
neighborhood structures were modified according to those factors. A multi-objective genetic algorithm scan statistic is employed
for finding spatial clusters in a map divided in a finite number of regions, whose adjacency is defined by a graph structure.
This cluster finder maximizes two objectives, the spatial scan statistic and the regularity of cluster shape. Instead of specifying
locations for the possible clusters a priori, as is currently done for cluster finders based on focused algorithms, we alter
the usual adjacency induced by the common geographical boundary between regions. In our approach, the connectivity between
regions is reinforced or weakened, according to certain environmental features of interest associated with the map. We build
various plausible scenarios, each time modifying the adjacency structure on specific geographic areas in the map, and run
the multi-objective genetic algorithm for selecting the best cluster solutions for each one of the selected scenarios. The
statistical significances of the most likely clusters are estimated through Monte Carlo simulations. The clusters with the
lowest estimated p-values, along with their corresponding maps of enhanced environmental features, are displayed for comparative analysis. Therefore
the probability of cluster detection is increased or decreased, according to changes made in the adjacency graph structure,
related to the selection of environmental features. The eventual identification of the specific environmental conditions which
induce the most significant clusters enables the practitioner to accept or reject different hypotheses concerning the relevance
of geographical factors. Numerical simulation studies and an application for malaria clusters in Brazil are presented. 相似文献
2.
Anderson Ribeiro Duarte Luiz Duczmal Sabino José Ferreira André Luiz F. Cançado 《Environmental and Ecological Statistics》2010,17(2):203-229
The geographic delineation of irregularly shaped spatial clusters is an ill defined problem. Whenever the spatial scan statistic
is used, some kind of penalty correction needs to be used to avoid clusters’ excessive irregularity and consequent reduction
of power of detection. Geometric compactness and non-connectivity regularity functions have been recently proposed as corrections.
We present a novel internal cohesion regularity function based on the graph topology to penalize the presence of weak links
in candidate clusters. Weak links are defined as relatively unpopulated regions within a cluster, such that their removal
disconnects it. By applying this weak link cohesion function, the most geographically meaningful clusters are sifted through
the immense set of possible irregularly shaped candidate cluster solutions. A multi-objective genetic algorithm (MGA) has
been proposed recently to compute the Pareto-sets of clusters solutions, employing Kulldorff’s spatial scan statistic and
the geometric correction as objective functions. We propose novel MGAs to maximize the spatial scan, the cohesion function
and the geometric function, or combinations of these functions. Numerical tests show that our proposed MGAs has high power
to detect elongated clusters, and present good sensitivity and positive predictive value. The statistical significance of
the clusters in the Pareto-set are estimated through Monte Carlo simulations. Our method distinguishes clearly those geographically
inadequate clusters which are worse from both geometric and internal cohesion viewpoints. Besides, a certain degree of irregularity
of shape is allowed provided that it does not impact internal cohesion. Our method has better power of detection for clusters
satisfying those requirements. We propose a more robust definition of spatial cluster using these concepts. 相似文献
3.
André L. F. Cançado Cibele Q. da-Silva Michel F. da Silva 《Environmental and Ecological Statistics》2014,21(4):627-650
The scan statistic is widely used in spatial cluster detection applications of inhomogeneous Poisson processes. However, real data may present substantial departure from the underlying Poisson process. One of the possible departures has to do with zero excess. Some studies point out that when applied to data with excess zeros, the spatial scan statistic may produce biased inferences. In this work, we develop a closed-form scan statistic for cluster detection of spatial zero-inflated count data. We apply our methodology to simulated and real data. Our simulations revealed that the Scan-Poisson statistic steadily deteriorates as the number of zeros increases, producing biased inferences. On the other hand, our proposed Scan-ZIP and Scan-ZIP+EM statistics are, most of the time, either superior or comparable to the Scan-Poisson statistic. 相似文献
4.
5.
6.
Whether general environmental exposures to endocrine disrupting chemicals (including pesticides and dioxin) might induce decreased
sex ratios (male/female ratio at birth) is discussed. To address this issue, the authors looked for a space-time clustering
test which could detect local areas of significantly low risk, assuming a Bernoulli distribution. As a matter of fact, if the endocrine disruptor hypothesis holds true, and if the
sex ratio is a sentinel health event indicative of new reproductive hazards ascribed to environmental factors, then in a given
region, either a cluster of low male/female ratio among newborn babies would be expected in the vicinity of polluting municipal
solid waste incinerators (MSWIs) (supporting the dioxin hypothesis), or local clusters would be expected in some rural areas
where large amounts of pesticides are sprayed.
Among cluster detection tests, the spatial scan statistic has been widely used in various applications to scan for areas
with high rates, and rarely (if ever) with low rates. Therefore, the goal of this paper was to check the properties of the
scan statistics under a given scenario (Bernoulli distribution, search for clusters with low rates) and to assess its added
value in addressing the sex ratio issue.
This study took place in the Franche-Comté region (France), mainly rural, comprising three main MSWIs, among which only one
had high dioxin emissions level in the past. The study population consisted of 192,490 boys and 182,588 girls born during
the 1975–1999 period.
On the whole, the authors conclude that: (i) spatial and space-time scan statistics provide attractive features to address
the sex ratio issue; (ii) sex ratio is not markedly affected across space and does not provide a reliable screening measure
for detecting reproductive hazards ascribed to environmental factors. 相似文献
7.
A declared need is around for geoinformatic surveillance statistical science and software infrastructure for spatial and spatiotemporal hotspot detection. Hotspot means something unusual, anomaly, aberration, outbreak, elevated cluster, critical resource area, etc. The declared need may be for monitoring, etiology, management, or early warning. The responsible factors may be natural, accidental, or intentional. This proof-of-concept paper suggests methods and tools for hotspot detection across geographic regions and across networks. The investigation proposes development of statistical methods and tools that have immediate potential for use in critical societal areas, such as public health and disease surveillance, ecosystem health, water resources and water services, transportation networks, persistent poverty typologies and trajectories, environmental justice, biosurveillance and biosecurity, among others. We introduce, for multidisciplinary use, an innovation of the health-area-popular circle-based spatial and spatiotemporal scan statistic. Our innovation employs the notion of an upper level set, and is accordingly called the upper level set scan statistic, pointing to a sophisticated analytical and computational system as the next generation of the present day popular SaTScan. Success of surveillance rests on potential elevated cluster detection capability. But the clusters can be of any shape, and cannot be captured only by circles. This is likely to give more of false alarms and more of false sense of security. What we need is capability to detect arbitrarily shaped clusters. The proposed upper level set scan statistic innovation is expected to fill this need 相似文献
8.
Annalina Sarra Eugenia Nissi Sergio Palermi 《Environmental and Ecological Statistics》2012,19(2):219-247
Indoor radon is an important risk factor for human health. Indeed radon inhalation is considered the second cause of lung cancer after smoking. During the last decades, in many countries huge efforts have been made in order to measuring, mapping and predicting radon levels in dwellings. Various researches have been devoted to identify those areas within the country where high radon concentrations are more likely to be found. Data collected through indoor radon surveys have been analysed adopting various statistical approaches, among which hierarchical Bayesian models and geostatistical tools are worth noting. The essential goal of this paper regards the identification of high radon concentration areas (the so-called radon prone areas) in the Abruzzo Region (Italy). In order to accurately pinpoint zones deserving attention for mitigation purpose, we adopt spatial cluster detection techniques, traditionally employed in epidemiology. As a first step, we assume that indoor radon measurements do not arise from a continuous spatial process; thus the geographic locations of dwellings where the radon measurements have been taken can be viewed as a realization of a spatial point process. Following this perspective, we adopt and compare recent cluster detection techniques: the simulated annealing scan statistic, the case event approach based on distance regression on the selection order and the elliptic spatial scan statistic. The analysis includes data collected during surveys carried out by the Regional Agency for the Environment Protection of Abruzzo (ARTA) in 1,861 random sampled dwellings across 277 municipalities of the Abruzzo region. The radon prone areas detected by the selected approaches are provided along with the summary statistics of the methods. Finally, the methodologies considered in this paper are tested on simulated data in order to evaluate their power and the precision of cluster location detection. 相似文献
9.
G. P. Patil J. A. Bishop W. L. Myers C. Taillie R. Vraney Denice Wardrop 《Environmental and Ecological Statistics》2004,11(2):139-164
Geographical surveillance for hotspot detection and delineation has become an important area of investigation both in geospatial ecosystem health and in geospatial public health. In order to find critical areas based on synoptic cellular data, geospatial ecosystem health investigations apply recently discovered echelon tools. In order to find elevated rate areas based on synoptic cellular data, geospatial public health investigations apply recently discovered spatial scan statistic tools. The purpose of this paper is to conceptualize a joint role for these together in the spirit of a cross-disciplinary cross-fertilization to accomplish more effective and efficient geographical surveillance for hotspot detection and delineation, and early warning system. 相似文献
10.
This paper extends the spatial local-likelihood model and the spatial mixture model to the space-time (ST) domain. For comparison,
a standard random effect space-time (SREST) model is examined to allow evaluation of each model’s ability in relation to cluster
detection. To pursue this evaluation, we use the ST counterparts of spatial cluster detection diagnostics. The proposed criteria
are based on posterior estimates (e.g., misclassification rate) and some are based on post-hoc analysis of posterior samples
(e.g., exceedance probability). In addition, we examine more conventional model fit criteria including mean square error (MSE).
We illustrate the methodology with a real ST dataset, Georgia throat cancer mortality data for the years 1994–2005, and a
simulated dataset where different levels and shapes of clusters are embedded. Overall, it is found that conventional SREST
models fair well in ST cluster detection and in goodness-of-fit, while for extreme risk detection the local likelihood ST
model does best. 相似文献
11.
Recent years have witnessed the growth of new information technologies and their applications to various disciplines. The
goal of this paper is to demonstrate how the two innovative methods, upper level set scan (ULS) hotspot detection and the
multicriteria prioritization scheme, facilitate population health and break new ground in public health surveillance. It is
believed that the social environment (i.e. social conditions and social capital) is one of the determinants of human health.
Using infant health data and 10 additional indicators of social environment in the 159 counties of Georgia, ULS identified
52 counties that are in double jeopardy (high infant mortality and a high rate of low infant birth weight). The multicriteria
ranking scheme suggested that there was no conspicuous spatial cluster of ranking orders, which improved the traditional decision
making by visual geographic cluster. Both hotspot detection and ranking methods provided an empirical basis for re-allocating
limited resources and several policy implications could be drawn from these analytic results. 相似文献
12.
Routine surveillance of a large geographic region for clusters of adverse health events, particularly cancers, often involves
small area health data, possibly controlling for exposure information. Many different methods have been proposed to test for
the presence of geographical clusters. Two of the most popular methods are the spatial scan method proposed by Kulldorff and
that using a fixed number of cases within scanning circles proposed by Besag and Newell. Although the second test is very
popular, it has some difficulties. While the scan test controls for the multiple testing problem, the Besag and Newell test
does not. Additionally, the latter method requires the setting of several tuning parameters whose values affect the test performance
and are subjectively chosen by the user. This creates a difficulty to make a fair comparison between the two methods and it
explains why there have been few formal studies evaluating their relative performances. In this paper, we modify the Besag
and Newell test allowing for the control of the error type I probability and compare its power with respect to that of the
spatial scan test. We used data sets from a publicly available simulated benchmark. We found that the two methods have similar
results, except for clusters located in sparsely populated regions, where the spatial scan method presented a better performance. 相似文献
13.
The purpose of this paper is to develop a set of associated statistical tests for spatial clustering. In particular, a set
of three associated tests will be developed; these will correspond to the three types of tests set out by Besag and Newell
(general tests, focused tests, and tests for the detection of clustering). The associated tests draw primarily, though not
exclusively, upon existing tests and results. The principal contributions are based upon the score statistic for focused tests,
which has been an important approach to testing for clustering around environmental hazards. The first contribution consists
of the formulation of a global statistic for general tests that corresponds to focused score statistics, along with an assessment
of the distribution of the statistic under the null hypothesis of no raised incidence. The local score statistics used for
focused tests will have the property of summing to the global statistic used for the corresponding general test. Attention
is also given to the maximum local score statistic for the “test for the detection of clustering”. The critical values of
this statistic which are required for testing the null hypothesis are described. Application of the methods is made to leukemia
data for central New York State. 相似文献
14.
Robin C. Puett Andrew B. Lawson Allan B. Clark James R. Hebert Martin Kulldorff 《Environmental and Ecological Statistics》2010,17(3):303-316
Many statistical tests have been developed to assess the significance of clusters of disease located around known sources
of environmental contaminants, also known as focused disease clusters. The majority of focused-cluster tests were designed
to detect a particular spatial pattern of clustering, one in which the disease cluster centers around the pollution source
and declines in a radial fashion with distance. However, other spatial patterns of environmentally related disease clusters
are likely given that the spatial dispersion patterns of environmental contaminants, and thus human exposure, depend on a
number of factors (i.e., meteorology and topography). For this study, data were simulated with five different spatial patterns
of disease clusters, reflecting potential pollutant dispersion scenarios: (1) a radial effect decreasing with increasing distance,
(2) a radial effect with a defined peak and decreasing with distance, (3) a simple angular effect, (4) an angular effect decreasing
with increasing distance and (5) an angular effect with a defined peak and decreasing with distance. The power to detect each
type of spatially distributed disease cluster was evaluated using Stone’s Maximum Likelihood Ratio Test, Tango’s Focused Test,
Bithell’s Linear Risk Score Test, and variations of the Lawson–Waller Score Test. Study findings underscore the importance
of considering environmental contaminant dispersion patterns, particularly directional effects, with respect to focused-cluster
test selection in cluster investigations. The effect of extra variation in risk also is considered, although its effect is
not substantial in terms of the power of tests. 相似文献
15.
Accounting for rate instability and spatial patterns in the boundary analysis of cancer mortality maps 总被引:1,自引:0,他引:1
Pierre Goovaerts 《Environmental and Ecological Statistics》2008,15(4):421-446
Boundary analysis of cancer maps may highlight areas where causative exposures change through geographic space, the presence
of local populations with distinct cancer incidences, or the impact of different cancer control methods. Too often, such analysis
ignores the spatial pattern of incidence or mortality rates and overlooks the fact that rates computed from sparsely populated
geographic entities can be very unreliable. This paper proposes a new methodology that accounts for the uncertainty and spatial
correlation of rate data in the detection of significant edges between adjacent entities or polygons. Poisson kriging is first
used to estimate the risk value and the associated standard error within each polygon, accounting for the population size
and the risk semivariogram computed from raw rates. The boundary statistic is then defined as half the absolute difference
between kriged risks. Its reference distribution, under the null hypothesis of no boundary, is derived through the generation
of multiple realizations of the spatial distribution of cancer risk values. This paper presents three types of neutral models
generated using methods of increasing complexity: the common random shuffle of estimated risk values, a spatial re-ordering
of these risks, or p-field simulation that accounts for the population size within each polygon. The approach is illustrated
using age-adjusted pancreatic cancer mortality rates for white females in 295 US counties of the Northeast (1970–1994). Simulation
studies demonstrate that Poisson kriging yields more accurate estimates of the cancer risk and how its value changes between
polygons (i.e., boundary statistic), relatively to the use of raw rates or local empirical Bayes smoother. When used in conjunction
with spatial neutral models generated by p-field simulation, the boundary analysis based on Poisson kriging estimates minimizes
the proportion of type I errors (i.e., edges wrongly declared significant) while the frequency of these errors is predicted
well by the p-value of the statistical test.
相似文献
Pierre GoovaertsEmail: |
16.
Air–water flows at hydraulic structures are commonly observed and called white waters. The free-surface aeration is characterised by some intense exchanges of air and water leading to complex air–water structures including some clustering. The number and properties of clusters may provide some measure of the level of particle-turbulence and particle–particle interactions in the high-velocity air–water flows. Herein a re-analysis of air–water clusters was applied to a highly aerated free-surface flow data set (Chanson and Carosi, Exp Fluids 42:385–401, 2007). A two-dimensional cluster analysis was introduced combining a longitudinal clustering criterion based on near-wake effect and a side-by-side particle detection method. The results highlighted a significant number of clustered particles in the high-velocity free-surface flows. The number of bubble/droplet clusters per second and the percentage of clustered particles were significantly larger using the two-dimensional cluster analysis than those derived from earlier longitudinal detection techniques only. A number of large cluster structures were further detected. The results illustrated the complex interactions between entrained air and turbulent structures in skimming flow on a stepped spillway, and the cluster detection method may apply to other highly aerated free-surface flows. 相似文献
17.
This paper presents a scan statistic, progressive upper level set (PULSE) scan statistic, for geospatial hotspot detection
and its software implementation. Like ULS, the PULSE scan statistic is based on the arbitrarily shaped scan window and can
be adapted for a network setting. PULSE is a refinement of the upper level set (ULS) scan statistic. Like some other likelihood
based scanning devices, the ULS scan statistic identifies maximum likelihood estimate (MLE) zones that tend to be ‘stringy’
and sprawling. Its search path increases possibility of inclusion of extraneous cells in its MLE zones and, to a smaller extent,
of exclusion of cells that belong to a true hotspot from its MLE zone. The PULSE scan statistic achieves improvement over
the ULS scan statistic in two ways. First, it begins its search for a most likely zone with a large population of candidate
zones obtained by modifying the ULS tree structure and continues its search using a genetic algorithm. Secondly, to reduce
chances of generating an MLE that is excessively stringy and that includes extraneous cells in the MLE zone, PULSE uses cardinality
and compactness of zones along with their likelihoods as the fitness function in the genetic algorithm and uses several pertinent
criteria including evenness of intra-zone cellular response ratios to determine the MLE zone. To reduce computation, Gumbel
distribution of extreme values is used to determine the p-value of the MLE zone. Better results come at the cost of increased processing time. An evaluative performance study is presented. 相似文献
18.
We formulate and simulation-test a spatial surplus production model that provides a basis with which to undertake multispecies, multi-area, stock assessment. Movement between areas is parameterized using a simple gravity model that includes a "residency" parameter that determines the degree of stock mixing among areas. The model is deliberately simple in order to (1) accommodate nontarget species that typically have fewer available data and (2) minimize computational demand to enable simulation evaluation of spatial management strategies. Using this model, we demonstrate that careful consideration of spatial catch and effort data can provide the basis for simple yet reliable spatial stock assessments. If simple spatial dynamics can be assumed, tagging data are not required to reliably estimate spatial distribution and movement. When applied to eight stocks of Atlantic tuna and billfish, the model tracks regional catch data relatively well by approximating local depletions and exchange among high-abundance areas. We use these results to investigate and discuss the implications of using spatially aggregated stock assessment for fisheries in which the distribution of both the population and fishing vary over time. 相似文献
19.
Cajo J. F. Ter Braak Herbert Hoijtink Wies Akkermans Piet F. M. Verdonschot 《Ecological modelling》2003,160(3):235
To predict macrofaunal community composition from environmental data a two-step approach is often followed: (1) the water samples are clustered into groups on the basis of the macrofauna data and (2) the groups are related to the environmental data, e.g. by discriminant analysis. For the cluster analysis in step 1 many hard, seemingly arbitrary choices have to be made that nevertheless influence the solution (similarity measure, clustering strategy, number of clusters). The stability of the solution is often of concern, e.g. in clustering by the
program. In the discriminant analysis of step 2 it can occur that a water sample is misclassified on the basis of the environmental data but on further inspection happens to be a borderline case in the cluster analysis. One would then rather reclassify such a sample and iterate the two steps. Bayesian latent class analysis is a flexible, extendable model-based cluster analysis approach that recently has gained popularity in the statistical literature and that has the potential to address these problems. It allows the macrofauna and environmental data to be modelled and analyzed in a single integrated analysis. An exciting extension is to incorporate in the analysis prior information on the habitat preferences of the macrofauna taxa such as is available in lists of indicator values. The output of the analysis is not a hard assignment of water samples to clusters but a probabilistic (fuzzy) assignment. The number of clusters is determined on the basis of the Bayes factor. A standard feature of the Bayesian method is to make predictions and to assess their uncertainty. We applied this approach to a data set consisting of 70 water samples, 484 macrofauna taxa and four environmental variables for which previously a five cluster solution had been proposed. The standard for Bayesian estimation, the Gibbs sampler, worked fine on a subset with only 12 selected taxa but did not converge on the full set with 484 taxa. This is due to many configurations in which the assignment probabilities are all very close to either 0 or 1. This convergence problem is comparable with the local optima problem in classical cluster optimization algorithms, including the EM algorithm used in Latent Gold, a Windows program for latent class analysis. The convergence problem needs to be solved before the benefits of Bayesian latent class analysis can come to fruition in this application. We discuss possible solutions. 相似文献
20.
Jos Corts Miguel Mahecha Markus Reichstein Alexander Brenning 《Environmental and Ecological Statistics》2020,27(2):293-318
The statistical analysis of environmental data from remote sensing and Earth system simulations often entails the analysis of gridded spatio-temporal data, with a hypothesis test being performed for each grid cell. When the whole image or a set of grid cells are analyzed for a global effect, the problem of multiple testing arises. When no global effect is present, we expect $$ \alpha $$% of all grid cells to be false positives, and spatially autocorrelated data can give rise to clustered spurious rejections that can be misleading in an analysis of spatial patterns. In this work, we review standard solutions for the multiple testing problem and apply them to spatio-temporal environmental data. These solutions are independent of the test statistic, and any test statistic can be used (e.g., tests for trends or change points in time series). Additionally, we introduce permutation methods and show that they have more statistical power. Real-world data are used to provide examples of the analysis, and the performance of each method is assessed in a simulation study. Unlike other simulation studies, our study compares the statistical power of the presented methods in a comprehensive simulation study. In conclusion, we present several statistically rigorous methods for analyzing spatio-temporal environmental data and controlling the false positives. These methods allow the use of any test statistic in a wide range of applications in environmental sciences and remote sensing. 相似文献