Methodology working papers
Viewing papers 1 to 10 of 130
Screening Strategies In The Presence Of Interactions (M11/10)
D. Draguljic, David C. Woods, A.M. Dean, S.M. Lewis, A.E. Vine
Abstract Download
Product and process improvement can involve a large number of factors which must be varied simultaneously. Understanding how factors interact is a key step in identifying those factors that have a substantial impact on the response. This paper assesses and compares screening strategies for interactions using supersaturated designs, group screening, and a variety of data analysis methods including shrinkage regression and Bayesian methods. Novel methodology is developed to allow application of Bayesian methods in two-stage group screening. Insights on using the
strategies are provided through a variety of simulation scenarios and open issues are discussed
A Covariance-Based Test For Shared Frailty In Multivariate Lifetime Data (M11/09)
Alan Kimber, Shah Jalal Sarker
Abstract Download
We decompose the score statistic for testing for shared finite variance frailty in multivariate lifetime data into marginal and covariance-based terms. The null properties of the covariance-based statistic are derived in the context of parametric lifetime models. Its non-null properties are estimated using simulation and compared with those of the score test and two likelihood ratio tests when the underlying lifetime distribution is Weibull. Some examples are used to illustrate the covariance-based test. A case is made for using the covariance-based statistic as a simple diagnostic procedure for shared frailty in a parametric exploratory analysis of multivariate lifetime data
Optimal designs for two-parameter nonlinear models
with application to survival models (M11/08)
Maria Konstantinou, Stefanie Biedermann, Alan Kimber
Abstract Download
Censoring may occur in many industrial or biomedical time to event experiments. Efficient designs for such experiments are needed but finding such designs can be problematic since the statistical models involved will usually be nonlinear, making the optimal choice of design parameter dependent. We provide analytical characterisations of locally D- and c-optimal designs for a large class of models. Our results are illustrated using the natural proportional hazards parameterisation of the exponential regression model, thus reducing the numerical effort for design search substantially. We also determine designs based on standardised optimality criteria when a range of parameter values is provided by the experimenter. Different
censoring mechanisms are incorporated and the robustness of designs to parameter misspecification is assessed. We demonstrate that, unlike traditional designs, the designs found perform well across a broad range of scenarios.
Outlier Robust Small Area Estimation (M11/07)
Ray Chambers, Hukum Chandra, Nicola Salvati, Nikos Tzavidis
Abstract Download
Recently proposed outlier robust small area estimators can be substantially biased when outliers are drawn from a distribution that has a different mean from that of the rest of the survey data. This naturally leads down to the idea of an outlier robust bias correction for these estimators. In this paper we develop this idea and also propose two different analytical mean squared error estimators for the ensuring bias corrected outlier robust estimators. Simulations based on realistic outlier contaminated data show that the proposed bias correction often leads to more efficient estimators. Furthermore the proposed mean squared error estimators appear to perform well with a variety of outlier robust smal area estimators.
Propensity Score Matching With Missing Covariates Via Iterated, Sequential Multiple Imputation (M11/06)
Robin Mitra, Jerome P. Reiter
Abstract Download
In many observational studies, analysts estimate causal effects using propensity score matching. Estimation of propensity scores is complicated when covariate values intended for collection are in fact missing. To handle the missing data, one approach is to use multiple imputation to create completed datasets, and compute propensity scores from these datasets. However, inaccurate imputation models can result in ineffective matching, thereby limiting reductions in bias. We propose a multiple imputation approach based on chained equations in which the researcher gradually reduces the set of control units used to estimate the imputation models. This approach can reduce the influence of control records far from the treated units’ region of the covariate space on the estimation of parameters in the imputation model, which can result in more plausible imputations and better balance in the true covariate distributions. This approach can be conveniently implemented with standard multiple imputation software for missing data. Using simulations, we find that the approach can improve estimation when imputation models are mis-specified; however, it can be ineffective when imputation models are correctly specified. This suggests using the approach as part of sensitivity analysis in causal inference. We apply the approach to an observational study of the effect of breast-feeding on the child’s educational outcomes later in life.
A comparison of two methods of estimating propensity scores after multiple imputation (M11/05)
Robin Mitra, Jerome P. Reiter
Abstract Download
In many observational studies, analysts estimate treatment effects using propensity scores, e.g., by matching or sub classifying on the scores. When some values of the covariates are missing, analysts can use multiple imputation to fill in the missing data, estimate propensity scores based on the m completed datasets, and use the propensity scores to estimate treatment effects. We compare two approaches to implementing this process. In the first, the analyst estimates the treatment effect using propensity score matching within each completed data set, and averages the m treatment effect estimates. In the second approach, the analyst averages the m propensity scores for each record across the completed datasets, and performs propensity score matching with these averaged scores to estimate the treatment effect. We compare properties of both methods via simulation studies using arti�cal and real data. The simulations suggest that the second method has greater potential to produce substantial bias reductions than the first.
Resistance To Outliers Of M-Quantile And Robust Random Effects Small Area Models (M11/04)
Caterina Giusti, Nikos Tzavidis, Monica Pratesi, Nicola Salvati
Abstract Download
The presence of outliers is a common feature in real data applications. Previous literature has established that outliers can severely affect the parameter estimates of statistical models, which in turn can affect the small area estimates produced using these models. Two outlier robust small area estimation methodologies have been recently proposed in the small area literature. These are the M-quantile approach (Chambers and Tzavidis, 2006) and the robust random effects approach (Sinha and Rao, 2009). As Sinha and Rao (2009) point out, a comparison of these two methodologies is needed. The present paper sets to fulfill this goal.
Analysing the Process Leading to Cooperation or Refusal Using Call Record Data: A Multilevel Multinomial Modelling Approach (M11/03)
Julia D'Arrigo, Gabriele B. Durrant, Fiona Steele
Abstract Download
In recent years, survey agencies have started to collect detailed call record data, including information on the timing and outcome of each interviewer call to a household. In interviewbased household surveys, effective interviewer calling behaviours are critical in achieving cooperation and reducing the likelihood of refusal. This paper aims to analyze interviewer call record data to inform the process leading to cooperation or refusal in face-to-face surveys. Of particular interest are the influences on the outcome of a call of interactions between the interviewer and householder and of time-varying characteristics of the call. A multilevel multinomial logistic regression approach is used in which the different possible outcomes at each call are modelled jointly.
Non-parametric Bootstrap Mean Squared Error Estimation for M-quantile Estimators of Small Area Averages, Quantiles and Poverty Indicators (M11/02)
Stefano Marchetti, Nikos Tzavidis, Monica Pratesi
Abstract Download
Small area estimation is conventionally concerned with the estimation of small area averages and totals. More recently emphasis has been also placed on the estimation of poverty indicators and of key quantiles of the small area distribution function using robust models for example, the M-quantile small area model (Chambers and Tzavidis, 2006). In parallel to point estimation, Mean Squared Error (MSE) estimation is an equally crucial and challenging task. However, while analytic MSE estimation for small area averages is possible, analytic MSE estimation for quantiles and poverty indicators is extremely difficult. Moreover, one of the main criticisms of the analytic MSE estimator for M-quantile estimates of small area averages proposed by Chambers and Tzavidis (2006) and Chambers et al. (2009) is that it can be unstable when the area-specific sample sizes are small.
Bayesian lightweight emulators for multivariate computer models (M11/01)
Antony M. Overstall, David C. Woods
Abstract Download
Statistical emulators for the outputs of complex computer codes (simulators) are typically constructed using nonparametric regression methods, such as Gaussian Process (GP) regression. For many simulators, emulators based on parametric models may provide adequate descriptions whilst enabling straightforward and computationally inexpensive fitting, inference and prediction. We place such so called “lightweight” emulators into the same Bayesian framework as the more usual nonparametric emulators, and provide methodology for their application to two novel examples with multivariate output: an emergency-relief simulator and a low-level atmospheric dispersion simulator. For the former, the inputs to the simulator are both continuous and categorical, and a comparison is made to GP emulators; for the latter, the output is zeroinflated and an appropriate emulator is developed from a Tobit model. In each case, sensitivity analyses are performed to identify the inputs to the simulator that have a substantive impact on the response, using both traditional methods and Bayesian model selection.