Skip to main navigation Skip to main content
The University of Southampton
Southampton Statistical Sciences Research Institute

Paper on choice of bandwidth in nonparametric density estimation appeared in JRSSB

Published: 10 March 2020

A recent paper accepted by Journal of Royal Statistical Society, Series B - a leading journal in Statistics - investigated the choice of adaptive and non-adaptive bandwidths for density estimation given data on a spatial lattice by proposing a spatial cross validation (SCV) choice of a so-called global bandwidth.

The paper was written by S3RI member Zudi Lu, Professor of Statistics, and also jointly with Dr Zhenyu Jiang (Associated Member of S3RI), Prof Nengxiang Ling (Hefei University of Technology, China), Prof Dag Tjostheim (University of Bergen, Norway) and Dr Qiang Zhang (Beijing University of Chemical Technology, China, and S3RI visitor hosted by Prof Lu).

An adaptive bandwidth depends on local data and hence adaptively conforms with local features of the spatial data. It is done first with a pilot density involved in the expression for the adaptive bandwidth. The optimality of the procedure is established, and it is shown that a non-adaptive bandwidth choice comes out as a special case. Further, for the adaptive bandwidth with an estimated pilot density, oracle properties of the resultant density estimator are obtained asymptotically as if the true pilot were known.

Although the CV idea has been popular for choosing a non-adaptive bandwidth in data-driven smoothing of independent and time series data, its theory and application have not been much investigated for spatial data. For the adaptive case, there is little theory even for independent data. The obtained theory on optimality of the SCV selected bandwidth actually also extends time series and independent data optimality results.

Numerical examples further show that the SCV adaptive bandwidth choice outperforms the existing R-routines such as the `rule of thumb' and the so-called `second-generation' Sheather-Jones bandwidths for moderate and big data, with non-Gaussian features more significantly identified.

Privacy Settings