We’ve just run a simple logistic regression using
neighpol1
as a binary categorical dependent variable and
age
as a continuous independent variable. Suppose now we were interested to see if a respondent’s employment status had any bearing on their awareness of neighbourhood policing. We may want to fit a logistic regression model using
neighpol1
as our dependent variable and
remploy
, respondent employment status, as our independent variable to see if we can find a significant relationship between these two variables.
Just as we did at the beginning of our logistic regression investigation of
neighpol1
and
age
, we should run some exploratory analysis to determine if a relationship between these variables exists.
When our independent variable
age
was continuous, we used a t test to compare means. Now, our independent variable
remploy
is categorical, so we’ll start by running crosstabulations. Select
Analyze
,
Descriptive Statistics
, and
Crosstabs
. Move
neighpol1
into the
Column(s)
box and
remploy
into the
Row(s)
box. Click the
Statistics
button and select
Chi-Square
. Click
Continue
. Because we are curious about
remploy
, we’d also like to see some row percentages. Click on
Cells
, and then under the
Percentages
header, select
Row
. Click
Continue
. Then, click
OK
to run the crosstabulation.
Your output should look like the one on the right.
How many unemployed people were aware of neighbourhood policing?
How many economically inactive people were not aware of neighbourhood policing?
What percentage of employed people were aware of neighbourhood policing?
Is there a significant relationship between
neighpol1
and
remploy
? How can you tell?
Now we can fit our logistic regression model using
neighpol1
as the dependent variable and
remploy
as the independent variable.
Select
Analyze
,
Regression
, and then
Binary Logistic
.
Move
neighpol1
to the
Dependent
text box. Move
remploy
to the
Covariates
text box. Because
remploy
is a categorical variable, we have to tell SPSS to create dummy variables for each of the categories. (SPSS will do this for us in logistic regression – unlike in linear regression, when we had to create the dummies ourselves.) To tell SPSS that
remploy
is a categorical variable, click
Categorical
in the upper right corner of the
Logistic Regression
text box.
Move
remploy
from the
Covariates
text box on the left to the
Categorical Covariates
text box on the right. Click
Continue
.
The original
Logistic Regression
dialogue box should now have
remploy(Cat)
in the
Covariates
text box.
We also want SPSS calculate confidence intervals for
remploy
for us. In the
Logistic Regression
dialogue box you should have open, click
Options
. Under
Statistics and Plots
, select
CI for exp(B)
. This should already be set at
95%
.
Click
Continue
and then
OK
in the original
Logistic Regression
dialogue box.
Now we can examine the output.
You can see in the
Case Processing Summary
that again, we’re only analysing about one quarter of the survey respondents, because our dependent variable
neighpol1
was only asked in Module A.
In the
Dependent Variable Encoding
table, you can see that awareness of neighbourhood policing (“Yes”) is coded as 0 and being unaware of neighbourhood policing (“No”) is coded as 1. Again, just like in the simple logistic regression we performed on the previous page, we will be predicting the odds of being unaware of neighbourhood policing in this logistic regression.
The
Categorical Variables Codings
table shows us the frequencies of respondent employment. In addition, it also tells us that the three categories of
remploy
have been recoded in our logistic regression as dummy variables. In logistic regression, just as in linear regression, we are comparing groups to each other. In order to make a comparison, one group has to be omitted from the comparison to serve as the baseline. In our logistic regression, “Economically inactive” has been selected as the baseline (or constant) dummy variable to which we will compare the predictions for “Employed” and “Unemployed.” Therefore, “Economically inactive” won’t be included in our model. (You can see that in the table below it isn’t coded with a “1” in any case, because it is the baseline, comparison category and has not been added to the model.) You can change the category to be used as the baseline to either the first or last categories – this is done where you specify that the variable is categorical.
As we’re not going to use any of the information provided for us in Block 0, the output has been left out of this worksheet. If you’d like to work through some of the information provided for you in Block 0, you can use the interpretation provided for the neighpol1 and age logistic regression model we did on the previous page.
Remember that the Omibus Tests of Model Coefficients output table shows the results of a chi-square test to determine whether or not employment has a significant influence on neighbourhood policing awareness. The Chi-square has produced a p-value of .018, making our employment status model significant at the 5% level.
Take a look at the Variables in the Equation output table below. Let’s first look at the significance levels. Remploy(1) , or “Employed,” has a p-value of .026, making it significant at the p < .05 level. Remploy(2) , or “Unemployed,” on the other hand, has a p-value of .202, telling us that those who are in this category are no different in their awareness than the baseline category of economically inactive.
If we were to fit this model again, and wanted to use
remploy
, we may be tempted to remove
remploy(2)
from the model, as it is not significant. However, we can’t do this. Why?
Remember that in this model, “Economically Inactive” was selected as our baseline comparison dummy variable and is called
remploy
in our model outputs. Because
remploy(1)
(with a p-value of .026) is a significant predictor of the odds of neighbourhood policing, we can use the odds ratio information provided for us in the
Exp(B)
column to say that a respondent who is employed has odds of being unaware of neighbourhood policing that are 0.917 of the odds of someone who is economically inactive. This means that the employed are more likely than the economically inactive to know about neighbourhood policing. An odds ratio less than 1 means that the odds of an event occurring are lower in that category than the odds of the event occurring in the baseline comparison variable. An odds ratio more than 1 means that the odds of an event occurring are higher in that category than the odds of the event occurring in the baseline comparison variable.
A respondent who is unemployed has odds of being unaware of neighbourhood policing that are _____________ of the odds of a respondent who is economically inactive. This means that the unemployed are ________ likely than the economically inactive to know about neighbourhood policing.
In addition, SPSS has calculated confidence intervals for us. Remember that confidence intervals allow us to extend out analyses from the sample in our data to the population as a whole. We can say, with 95% confidence, that for the entire population of England, employed people have odds of being unaware of neighbourhood policing that are 0.850 to 0.990 the odds of people who are economically inactive.
First, you used a chi square test test to determine whether or not a statistically significant relationship existed between our categorical independent variable remploy and our categorical dependent variable neighpol1. Then, using simple logistic regression, you predicted the odds of a survey respondent being unaware of neighbourhood policing with regard to their employment status. Finally, using the odds ratios provided by SPSS in the Exp(B) column of the Variables in the Equation output table, you were able to interpret the odds of employed respondents being unaware of neighbourhood policing.
Note: as we are making changes to a dataset we’ll continue using for the rest of this section, please make sure to save your changes before you close down SPSS. This will save you having to repeat sections you’ve already completed.