×

A chi-square test is a statistical test used to compare observed results with expected results. The purpose of this test is to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables you are studying. Therefore, a chi-square test is an excellent choice to help us better understand and interpret the relationship between our two categorical variables.

To perform a chi-square exploring the statistical significance of the relationship between
**
s2q10
**
and
**
s1truan
**
, select
**
Analyze
**
,
**
Descriptive Statistics
**
, and then
**
Crosstabs
**
.

Find
**
s2q10
**
in the variable list on the left, and move it to the
**
Row(s)
**
box. Find
**
s1truan
**
in the variable list on the left, and move it to the
**
Column(s)
**
box.

Click
**
Statistics
**
, and select
**
Chi-square
**
.

Click
**
Continue
**
and then
**
OK
**
to run the analysis. Your output should look like the table on the right.

Take a look at the column on the far right of this output table. It is the Asymptotic Significance, or p-value, of the chi-square we’ve just run in SPSS. This value determines the statistical significance of the relationship we’ve just tested. In all tests of significance, if p < 0.05, we can say that there is a statistically significant relationship between the two variables. The p-value in our chi-square output is p = 0.000. This means that the relationship between Year 11 truancy and enrolment in full time education after secondary school is significant.

It’s worth mentioning now that this test, like all tests of significance, only illuminates that there is a relationship and that that relationship has statistical significance (meaning, it is not due to chance). Running a chi-square test cannot tell you anything about a causal relationship between truancy and later educational enrolment.

Let’s run one more chi square test together. Thinking about other individual characteristics that may influence a young person’s enrolment in full time education after secondary school, we may be interested in the impact parental educational attainment has on a student’s future plans. Therefore, let’s use
**
s1q62a
**
, which concerns whether or not a young person’s father obtained a degree, in another chi square test with
**
s2q10
**
.

Before we use
**
s1q62a
**
, we should check its frequencies to make sure the data is ready for bivariate analysis.

Go to
**
Analyze
**
,
**
Descriptive Statistics
**
, and then
**
Frequencies
**
. Move
**
s1q62a
**
into the
**
Variable(s)
**
box on the right side of the dialogue box.

Click
**
OK
**
to run a frequency test.

Notice in the frequency output table that along with the answers “Yes,” “No,” and “Not sure,” which we are interested in, there is the category “Not answered.” Because these survey respondents haven’t responded to this question, their answers are missing. We should code this information as missing data before we run our chi square test, so that we are only performing the test on data relevant to our research question.

Luckily, doing this is very easy.

First, find
**
s1q62a
**
in the
**
Variable View
**
window of the SPSS
**
Data Editor
**
. (You can do this easily by clicking to highlight any cell in the
**
Name
**
column on the far left of the
**
Variable View
**
screen, and hitting
**
Ctrl + F
**
. This will open up a
**
Find and Replace
**
dialogue box. Just enter
**
s1q62a
**
into the text bar and click
**
Find Next
**
. This will find the
**
s1q62a
**
row in the dataset.)

When you’ve found
**
s1q62a
**
, move across its row until you find the
**
Values
**
column. Click this cell to open it. Now you should see a dialogue box that lists all the numerical values of the categories of this variable.

In this dialogue box, you can see that “Not answered,” the missing data category, is listed as -9.00. In addition, there are categories “Item not applicable” (with a value of 4.00) and “Not answered (9)” (with a value of 5.00) listed. We‘ll include these in our recoding, as they also represent missing data.

To recode these categories as missing data, all you need to do is move over one column to the
**
Missing
**
column. The
**
Missing
**
column in the
**
s1q62a
**
row should say “None,” as no data is classed as missing just yet.

Click to open that cell. In the dialogue box that opens, select
**
Discrete missing values
**
and enter
**
-9.00
**
,
**
4.00
**
, and
**
5.00
**
into the three text boxes provided. These are the numerical codes for the three categories that include missing data.

Click
**
OK
**
.

You should now see that our three missing codes are saved in the
**
Missing
**
cell of the
**
s1q62a
**
row. To make sure that the missing data is no longer included in tests we run using this variable, run a frequency check on
**
s1q62a
**
.

In this new frequency table, the “Not answered” category is listed as Missing, so our recoding has been successful.

And, because we have cleaned up
**
s1q62a
**
, we are ready to run our chi square test.

Select
**
Analyze
**
,
**
Descriptive Statistics
**
, and then
**
Crosstabs
**
. Find
**
s2q10
**
in the variable list on the left, and move it to the
**
Row(s)
**
box. Find
**
s1q62a
**
in the variable list on the left, and move it to the
**
Column(s)
**
box.

Click
**
Statistics
**
, and select
**
Chi-square
**
. Click
**
Continue
**
and then
**
OK
**
to run the analysis. Your output should look like the table on the left.

Take a look at the Asymptotic Significance of this chi square test. Using this information, what can we say about the relationship between paternal degree and full time enrolment in education after secondary school?

*
Run another chi square to test the significance of the relationship between
s1q10
and
s1q62b
, a variable that concerns whether or not a survey respondent’s mother has obtained a degree. Before you run the chi square, make sure to check the frequencies in
s1q62b
and make any corrections you think are necessary.
*

Is there a statistically significant relationship between maternal degree and full time education after secondary school?

**
Using a chi square test, you’ve just determined that there is, in fact, a statistically significant relationship between our two categorical variables, s2q10 and s1truan. In addition, you’ve run another chi square, determining that there is a statistically significant relationship between s2q10 and s1q62a, a measure of whether or not a respondent’s father had obtained a degree. Remember that you are simply able to say now that paternal degree and Year 11 truancy both have relationships with respondent enrolment in full time education after secondary school. You cannot say, for example, that a paternal degree causes enrolment in full time education.
**

**
Note: as we are making changes to a dataset we’ll continue using for the rest of this section, please make sure to save your changes before you close down SPSS. This will save you having to repeat sections you’ve already completed.
**