Skip to main navigationSkip to main content
The University of Southampton
Practical Applications of Statistics in the Social SciencesResearch Question 3: GCSE Scores

Multivariate analysis: Linear

 

What are the predicted GCSE scores for young people in Year 11?

In this section so far, we’ve been exploring what individual characteristics influence the total GCSE scores of young people in their last year of secondary school. We’re using data from the Youth Cohort Study of England and Wales 2004-2007. We’ve already looked at ways to determine if statistically significant relationships exist between variables. Now we’ll investigate how we can use these relationships and linear regression to make predictions about mean GCSE scores in s1gcseptsnew.

Linear regression is a statistical analysis that allows us to model the relationship between two (or more) variables and predict the values of dependent variables. Because linear regression uses means to predict scores, the models created by linear regression require a continuous dependent variable, which we have in the form of s1gcseptsnew. (There are no restrictions on types of independent variables in linear regression.) As we are curious to see what impact various independent variables have on our GCSE scores, linear regression is a fairly straightforward and enormously helpful tool.

As social scientists, we may want to begin by examining the extent of the relationship our dependent (or outcome) variable, s1gcseptsnew, has with respondent sex, or s1gender in the YCS. We can use linear regression to help us to this.

 

We’ll want to look into a number of individual characteristics that may impact mean GCSE scores, and we can use variables concerning things like respondent sex, ethnicity, living situation, and the educational achievements of respondents’ parents. Some of these variables will be binary, meaning that they only have two categories. Those that are not binary have more than two categories. For example, a variable detailing a respondent’s ethnicity would have several categories, as there are more than two possible ethnic backgrounds. A variable concerning the sex of the respondent would be binary, as there are only two possible responses to this question (i.e. Male or Female).

Privacy Settings