首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
This paper develops statistical inference for population mean and total using stratified judgment post-stratified (SJPS) samples. The SJPS design selects a judgment post-stratified sample from each stratum. Hence, in addition to stratum structure, it induces additional ranking structure within stratum samples. SJPS is constructed from a finite population using either a with or without replacement sampling design. Inference is constructed under both randomization theory and a super population model. In both approaches, the paper shows that the estimators of population mean and total are unbiased. The paper also constructs unbiased estimators for the variance (mean square prediction error) of the sample mean (predictor of population mean), and develops confidence and prediction intervals for the population mean. The empirical evidence shows that the proposed estimators perform better than their competitors in the literature.  相似文献   

2.
Analysis of brood sex ratios: implications of offspring clustering   总被引:13,自引:0,他引:13  
Generalized linear models (GLMs) are increasingly used in modern statistical analyses of sex ratio variation because they are able to determine variable design effects on binary response data. However, in applying GLMs, authors frequently neglect the hierarchical structure of sex ratio data, thereby increasing the likelihood of committing 'type I' error. Here, we argue that whenever clustered (e.g., brood) sex ratios represent the desired level of statistical inference, the clustered data structure ought to be taken into account to avoid invalid conclusions. Neglecting the between-cluster variation and the finite number of clusters in determining test statistics, as implied by using likelihood ratio-based L2-statistics in conventional GLM, results in biased (usually overestimated) test statistics and pseudoreplication of the sample. Random variation in the sex ratio between clusters (broods) can often be accommodated by scaling residual binomial (error) variance for overdispersion, and using F-tests instead of L2-tests. More complex situations, however, require the use of generalized linear mixed models (GLMMs). By introducing higher-level random effects in addition to the residual error term, GLMMs allow an estimation of fixed effect and interaction parameters while accounting for random effects at different levels of the data. GLMMs are first required in sex ratio analyses whenever there are covariates at the offspring level of the data, but inferences are to be drawn at the brood level. Second, when interactions of effects at different levels of the data are to be estimated, random fluctuation of parameters can be taken into account only in GLMMs. Data structures requiring the use of GLMMs to avoid erroneous inferences are often encountered in ecological sex ratio studies.  相似文献   

3.
The objective of a long-term soil survey is to determine the mean concentrations of several chemical parameters for the pre-defined soil layers and to compare them with the corresponding values in the past. A two-stage random sampling procedure is used to achieve this goal. In the first step, n subplots are selected from N subplots by simple random sampling without replacement; in the second step, m sampling sites are chosen within each of the n selected subplots. Thus n · m soil samples are collected for each soil layer. The idea of the composite sample design comes from the challenge of reducing very expensive laboratory analyses: m laboratory samples from one subplot and one soil layer are physically mixed to form a composite sample. From each of the n selected subplots, one composite sample per soil layer is analyzed in the laboratory, thus n per soil layer in total. In this paper we show that the cost is reduced by the factor m — 1 when instead of the two-stage sampling its composite sample alternative is used; however, the variance of the composite sample mean is increased. In the case of positive intraclass correlation the increase is less than 12.5%; in the case of negative intraclass correlation the increase depends on the properties of the variable as well. For the univariate case we derive the optimal number of subplots and sampling sites. A case study is discussed at the end.  相似文献   

4.
When sample observations are expensive or difficult to obtain, ranked set sampling is known to be an efficient method for estimating the population mean, and in particular to improve on the sample mean estimator. Using best linear unbiased estimators, this paper considers the simple linear regression model with replicated observations. Use of a form of ranked set sampling is shown to be markedly more efficient for normal data when compared with the traditional simple linear regression estimators.  相似文献   

5.
A dynamic and heterogeneous species abundance model generating the lognormal species abundance distribution is fitted to time series of species data from an assemblage of stoneflies and mayflies (Plecoptera and Ephemeroptera) of an aquatic insect community collected over a period of 15 years. In each year except one, we analyze 5 parallel samples taken at the same time of the season giving information about the over-dispersion in the sampling relative to the Poisson distribution. Results are derived from a correlation analysis, where the correlation in the bivariate normal distribution of log abundance is used as measurement of similarity between communities. The analysis enables decomposition of the variance of the lognormal species abundance distribution into three components due to heterogeneity among species, stochastic dynamics driven by environmental noise, and over-dispersion in sampling, accounting for 62.9, 30.6 and 6.5% of the total variance, respectively. Corrected for sampling the heterogeneity and stochastic components accordingly account for 67.3 and 32.7% of the among species variance in log abundance. By using this method, it is possible to disentangle the effect of heterogeneity and stochastic dynamics by quantifying these components and correctly remove sampling effects on the observed species abundance distribution.  相似文献   

6.
Ver Hoef JM  Boveng PL 《Ecology》2007,88(11):2766-2772
Quasi-Poisson and negative binomial regression models have equal numbers of parameters, and either could be used for overdispersed count data. While they often give similar results, there can be striking differences in estimating the effects of covariates. We explain when and why such differences occur. The variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative binomial model is a quadratic function of the mean. These variance relationships affect the weights in the iteratively weighted least-squares algorithm of fitting models to data. Because the variance is a function of the mean, large and small counts get weighted differently in quasi-Poisson and negative binomial regression. We provide an example using harbor seal counts from aerial surveys. These counts are affected by date, time of day, and time relative to low tide. We present results on a data set that showed a dramatic difference on estimating abundance of harbor seals when using quasi-Poisson vs. negative binomial regression. This difference is described and explained in light of the different weighting used in each regression method. A general understanding of weighting can help ecologists choose between these two methods.  相似文献   

7.
Adjusted two-stage adaptive cluster sampling   总被引:1,自引:0,他引:1  
An adjusted two-stage sampling procedure is discussed for adaptive cluster sampling where some networks are large and others are small. A two-stage sample is drawn from the large networks and a single-stage sample is drawn from the rest. The simple random sampling (SRS) procedure without replacement is used at the initial stage. An estimator for the population mean along with its properties is discussed.  相似文献   

8.
An important aspect of species distribution modelling is the choice of the modelling method because a suboptimal method may have poor predictive performance. Previous comparisons have found that novel methods, such as Maxent models, outperform well-established modelling methods, such as the standard logistic regression. These comparisons used training samples with small numbers of occurrences per estimated model parameter, and this limited sample size may have caused poorer predictive performance due to overfitting. Our hypothesis is that Maxent models would outperform a standard logistic regression because Maxent models avoid overfitting by using regularisation techniques and a standard logistic regression does not. Regularisation can be applied to logistic regression models using penalised maximum likelihood estimation. This estimation procedure shrinks the regression coefficients towards zero, causing biased predictions if applied to the training sample but improving the accuracy of new predictions. We used Maxent and logistic regression (standard and penalised) to analyse presence/pseudo-absence data for 13 tree species and evaluated the predictive performance (discrimination) using presence-absence data. The penalised logistic regression outperformed standard logistic regression and equalled the performance of Maxent. The penalised logistic regression may be considered one of the best methods to develop species distribution models trained with presence/pseudo-absence data, as it is comparable to Maxent. Our results encourage further use of the penalised logistic regression for species distribution modelling, especially in those cases in which a complex model must be fitted to a sample with a limited size.  相似文献   

9.
Gray BR  Burlew MM 《Ecology》2007,88(9):2364-2372
Ecologists commonly use grouped or clustered count data to estimate temporal trends in counts, abundance indices, or abundance. For example, the U.S. Breeding Bird Survey data represent multiple counts of birds from within each of multiple, spatially defined routes. Despite a reliance on grouped counts, analytical methods for prospectively estimating precision of trend estimates or statistical power to detect trends that explicitly acknowledge the characteristics of grouped count data are undescribed. These characteristics include the fact that the sampling variance is an increasing function of the mean, and that sampling and group-level variance estimates are generally estimated on different scales (the sampling and log scales, respectively). We address these issues for repeated sampling of a single population using an analytical approach that has the flavor of a generalized linear mixed model, specifically that of a negative binomial-distributed count variable with random group effects. The count mean, including grand intercept, trend, and random group effects, is modeled linearly on the log scale, while sampling variance of the mean is estimated on the log scale via the delta method. Results compared favorably with those derived using Monte Carlo simulations. For example, at trend = 5% per temporal unit, differences in standard errors and in power were modest relative to those estimated by simulation (< or = /11/% and < or = /16/%, respectively), with relative differences among power estimates decreasing to < or = /7/% when power estimated by simulations was > or = 0.50. Similar findings were obtained using data from nine surveys of fingernail clams in the Mississippi River. The proposed method is suggested (1) where simulations are not practical and relative precision or power is desired, or (2) when multiple precision or power calculations are required and where the accuracy of a fraction of those calculations will be confirmed using simulations.  相似文献   

10.
We compare the performance of a number of estimators of the cumulative distribution function (CDF) for the following scenario: imperfect measurements are taken on an initial sample from afinite population and perfect measurements are obtained on a small calibration subset of the initial sample. The estimators we considered include two naive estimators using perfect and imperfect measurements; the ratio, difference and regression estimators for a two-phasesample; a minimum MSE estimator; Stefanski and Bay's SIMEX estimator (1996); and two proposed estimators. The proposed estimators take the form of a weighted average of perfect and imperfect measurements. They are constructed by minimizing variance among the class of weighted averages subject to an unbiasedness constraint. They differ in the manner of estimating the weight parameters. The first one uses direct sample estimates. The second one tunes the unknown parameters to an underlying normal distribution. We compare the root mean square error (RMSE) of the proposed estimator against other potential competitors through computer simulations. Our simulations show that our second estimator has the smallest RMSE among thenine compared and that the reduction in RMSE is substantial when the calibration sample is small and the error is medium or large.  相似文献   

11.
This study presents a classification method combining logistic regression and fuzzy logic in the determination of sampling sites for feral fish, Nile Tilapia (Tilapia rendalli). This method statistically analyzes the variable domains involved in the problem, by using a logistic regression model. This in turn generates the knowledge necessary to construct the rule base and fuzzy clusters of the fuzzy inference system (FIS) variables. The proposed hybrid method was validated using three fish stress indices; the Fulton Condition Factor (FCF) and the gonadossomatic and hepatossomatic indices (GSI and HSI, respectively), from fish sampled at 3 different locations in the Rio de Janeiro State. A multinomial logistic regression allowed for the FIS construction of the proposed method and both statistical approaches, when combined, complemented each other satisfactorily, allowing for the construction of an efficient classification method regarding feral fish sampling sites that, in turn, has great value regarding fish captures and fishery resource management.  相似文献   

12.
Randomized graph sampling (RGS) is an approach for sampling populations associated with or describable as graphs, when the structure of the graph is known and the parameter of interest is the total weight of the graph. RGS is related to, but distinct from, other graph-based approaches such as snowball and network sampling. Graph elements are clustered into walks that reflect the structure of the graph, as well as operational constraints on sampling. The basic estimator in RGS can be constructed as a Horvitz-Thompson estimator. I prove it to be design-unbiased, and also show design-unbiasedness of an estimator of the sample variance when walks are sampled with replacement. Covariates can be employed for variance reduction either through improved assignment of selection probabilities to walks in the design step, or through the use of alternative estimators during analysis. The approach is illustrated with a trail maintenance example, which demonstrates that complicated approaches to assignment of selection probabilities can be counterproductive. I describe conditions under which RGS may be efficient in practice, and suggest possible applications.  相似文献   

13.
Sampling from partially rank-ordered sets   总被引:1,自引:0,他引:1  
In this paper we introduce a new sampling design. The proposed design is similar to a ranked set sampling (RSS) design with a clear difference that rankers are allowed to declare any two or more units are tied in ranks whenever the units can not be ranked with high confidence. These units are replaced in judgment subsets. The fully measured units are then selected from these partially ordered judgment subsets. Based on this sampling scheme, we develop unbiased estimators for the population mean and variance. We show that the proposed sampling procedure has some advantages over standard ranked set sampling.  相似文献   

14.
Sampling designs considered for a national scale environmental monitoring programme are compared. Specifically, design strategies designed to monitor one aspect of this environmental programme, agro-ecosystem health, are assessed. Two types of panel survey designs are evaluated within the framework of two-stage sampling. Comparisons of these designs are discussed with regard to precision, cost, and other issues that need to be considered in planning long-term surveys. To compare precision, the underlying variance of a simple estimator of mean difference is derived for each of the two designs. A variance and cost model accounting for the different rotational sampling schemes across designs are developed. Optimum stage allocation for each design are assessed with the variance-cost models. The best choice of design varied with the conditions underlying the variance model and the degree of other sources of survey error expected in the programme.  相似文献   

15.
M. J. Riddle 《Marine Biology》1989,103(2):225-230
To calculate the number of samples required to estimate population density to a specified precision, a prior estimate of the sample variance is needed. Using data from the freshwater benthic literature, Downing (1979, 1980a) calculated a regression equation to predict sample variance from sampler size and population density. He used predicted sample variance to calculate the number of samples, of a range of sizes, required to estimate a range of population densities to a specified precision. He concludes that massive savings (1300 to 5000%) of total surface area sampled may be achieved by using sample units of small surface area. These conclusions are misleading. The data set used for the regression does not adequately cover the combination of a low-density population sampled by a device of small surface area. The benthic community of Belhaven Bay, East Lothian, Scotland was sampled in 1982 with a 0.1 m2 grab and a 0.0018 m2 corer, providing 112 sets of replicate data which were used to test the hypothesis that for a specified precision of the mean a considerable saving of total area sampled may be obtained by sampling with a device of small surface area. The benthos of Loch Creran, Argyll, Scotland was sampled with contiguous corer samples on four occasions in 1980 and 1981, providing 234 independent sets of replicate data. Contiguous samples were grouped to form several simulated series of samples of increasing surface area. A sampler of small surface area provided a saving of total area sampled of about 20%. Whether such a small saving is justifiable will depend on the extra field expenses incurred by taking many small samples.  相似文献   

16.
Large-scale patchiness in the distribution of the benthic fauna was investigated in Loch Etive (Scottish west coast) by two series of van Veen grab hauls. Each series was taken along a traverse across the width of the loch, with sampling points about 100 m apart. One sample series was taken on sandy mud and the other, at a greater depth, on soft mud. Two approaches in data processing were applied: (1) The variance: mean ratio and the Morisita I tests for significant aggregation were applied to the species abundances in the sample series; (2) 3 measures of sample homogeneity, each involving the calculation of an index of faunal similarity, were applied to the separate samples. The results, however, could not show significant differences between the two series of samples. These findings were compared to results obtained from (a) a previous study, where differences in patchiness between areas were shown for a smaller scale of sampling design in Loch Etive and neighbouring areas, and to which the I method is also applied here; and (b) the results of applying the variance: mean ratio and I tests to data published by Holme (1953). The present results indicate greater aggregation at the present scale of sampling than for the previous, smaller scale of sampling, or for the comparable scale of Holme's sampling. The degree of concordance shown by the values of I for the circular-design sampling to the values of the 3 measures of patchiness applied previously, and to the means of the species abundances, was measured by Spearman's rank correlation coefficient. The results clearly demonstrated the I values, unlike the others measured, to be almost completely independent of the mean. It was concluded that, for comparing the pattern from benthic samples using standard-size bottom samplers, where the mean may vary widely between each set of samples, the I method is probably most useful.  相似文献   

17.
Ranked set sampling can provide an efficient basis for estimating parameters of environmental variables, particularly when sampling costs are intrinsically high. Various ranked set estimators are considered for the population mean and contrasted in terms of their efficiencies and useful- ness, with special concern for sample design considerations. Specifically, we consider the effects of the form of the underlying random variable, optimisation of efficiency and how to allocate sampling effort for best effect (e.g. one large sample or several smaller ones of the same total size). The various prospects are explored for two important positively skew random variables (lognormal and extreme value) and explicit results are given for these cases. Whilst it turns out that the best approach is to use the largest possible single sample and the optimal ranked set best linear estimator (ranked set BLUE), we find some interesting qualitatively different conclusions for the two skew distributions  相似文献   

18.
This paper compares the procedures based on the extended quasi-likelihood, pseudo-likelihood and quasi-likelihood approaches for testing homogeneity of several proportions for over-dispersed binomial data. The type I error of the Wald tests using the model-based and robust variance estimates, the score test, and the extended quasi-likelihood ratio test (deviance reduction test) were examined by simulation. The extended quasi-likelihood method performs less well when mean responses are close to 1 or 0. The model-based Wald test based on the quasi-likelihood performs the best in maintaining the nominal level. The score test performs less well when the intracluster correlations are large or heterogeneous. In summary: (i) both the quasilikelihood and pseudo-likelihood methods appear to be acceptable but care must be taken when overfitting a variance function with small sample sizes; (ii) the extended quasi-likelihood approach is the least favourable method because its nominal level is much too high; and (iii) the robust variance estimator performs poorly, particularly when the sample size is small.  相似文献   

19.
In this article we consider asymptotic properties of the Horvitz-Thompson and Hansen-Hurwitz types of estimators under the adaptive cluster sampling variants obtained by selecting the initial sample by simple random sampling without replacement and by unequal probability sampling with replacement. We develop an asymptotic framework, which basically assumes that the number of units in the initial sample, as well as the number of units and networks in the population tend to infinity, but that the network sizes are bounded. Using this framework we prove that under each of the two variants of adaptive sampling above mentioned, both the Horvitz-Thompson and Hansen-Hurwitz types of estimators are design-consistent and asymptotically normally distributed. In addition we show that the ordinary estimators of their variances are also design-consistent estimators.  相似文献   

20.
An analysis of counts of sample size N=2 arising from a survey of the grass Bromus commutatus identified several factors which might seriously affect the estimation of parameters of Taylor's power law for such small sample sizes. The small sample estimation of Taylor's power law was studied by simulation. For each of five small sample sizes, N=2, 3, 5, 15 and 30, samples were simulated from populations for which the underlying known relationship between variance and mean was given by 2 = cd. One thousand samples generated from the negative binomial distribution were simulated for each of the six combinations of c=1,2 and 11, and d=1, 2, at each of four mean densities, =0.5, 1, 10 and 100, giving 4000 samples for each combination. Estimates of Taylor's power law parameters were obtained for each combination by regressing log10 s 2 on log10 m, where s 2 and m are the sample variance and mean, respectively. Bias in the parameter estimates, b and log10 a, reduced as N increased and increased with c for both values of d and these relationships were described well by quadratic response surfaces. The factors which affect small-sample estimation are: (i) exclusion of samples for which m = s 2 = 0; (ii) exclusion of samples for which s 2 = 0, but m > 0; (iii) correlation between log10 s 2 and log10 m; (iv) restriction on the maximum variance expressible in a sample; (v) restriction on the minimum variance expressible in a sample; (vi) underestimation of log10 s 2 for skew distributions; and (vii) the limited set of possible values of m and s 2. These factors and their effect on the parameter estimates are discussed in relation to the simulated samples. The effects of maximum variance restriction and underestimation of log10 s 2 were found to be the most severe. We conclude that Taylor's power law should be used with caution if the majority of samples from which s 2 and m are calculated have size, N, less than 15. An example is given of the estimated effect of bias when Taylor's power law is used to derive an efficient sampling scheme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号