The University of Southampton

# MATH6157 Applied Statistical Modelling

## Module Overview

This module will introduce important general aspects of statistical modelling and some fundamental aspects of data collection for computer and simulation experiments. A broad range of commonly-used statistical models will be encountered, and used to demonstrate both general principles and specific examples of modelling techniques in Python and R. A variety of exemplar applications and data sets will be presented.

### Aims and Objectives

To introduce, via a hands-on approach, the basic concepts and principals in statistical modelling in a computational paradigm.

After taking this module, students should understand

• why statistical modelling is important,
• the terminology and statistical principles associated with modelling,
• sufficient theory to deal with simple examples and have gained practical hands-on experience in more complex examples,
• how to use Python and R to fit, explore and exploit a variety of statistical models

### Syllabus

Introduction and revision

• Python and R, and their interface
• Data input, plotting and summaries
• Standard statistical distributions
• Principles of statistical inference
• Likelihood

Regression: linear and generalised linear modelling

• Model construction and estimation
• Model selection and information criteria
• Shrinkage regression (Lasso and ridge methods)

Random effects, mixed models, and data with complex correlation structures

• Grouping structures in data
• Interpretation of random effects and mixed models
• Discrete data and generalised linear mixed models
• Estimation of mixed models
• Autoregression models

Smoothing and nonparametric regression

• Kernel density estimation
• Splines and penalised splines
• Linear smoothing

Data collection for computational studies

• Fundamentals of design of experiments
• Computer and simulation experiments
• Latin hypercube sampling

### Learning and Teaching

#### Study time allocation

Contact hours:60
Private study hours:90
Total study time: 150 hours

#### Teaching and learning methods

Teaching methods

• 24 lecture hours
• 36 computer workshop hours

Learning methods

• Individual study facilitated via weekly worksheets to support lecture material and assessed coursework
• Supervised problem solving via computer lab sessions

For resources which are required or considered useful for the module: key texts, text books, data books, software, web sites, other sources of related information.

Description and/or list, with URL, library reference, etc

Software: Python and R (freely available)

Textbooks: no required textbooks but the following texts are considered useful:

• Davison, A.C. (2008). Statistical Models. CUP (QA 276 DAV).
• Faraway, J. (2014). Linear Models with R, (2nd Edn). Chapman and Hall/CRC (QA 279 FAR).
• Gelman, A. and Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. CUP (HA 31.3 GEL).
• Wood, S.N. (2006). Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC (QA 274.73 WOO).
• Wu, C.F.J. and Hamada, M. (2011). Experiments: planning, analysis and optimisation, (2nd Edn). Wiley (QA 279 WU).

### Assessment

#### Assessment methods

 Assessment Method Number % contribution to final mark Final assessment (x) Coursework (formative and summative) 2 100% (50% each) x Feedback Method Verbal feedback on (unassessed) worksheets and exercises in lab sessionsWritten feedback on both assessed pieces of coursework Referral Method Number % contribution to final mark Repeat a suitable modified piece of coursework 1 100%

Method of repeat year: Repeat year internally or externally