Examine linear relationship with SPSS

As athletic performances in many areas continue to improve incrementally over time, we would expect that the winning times for the men's 1,500 meter race, run at the Olympics every year since 1896, would fit this pattern. Regression techniques offer an opportunity to study this relationship.

The dataset contains 23 observations and two variables:
 * Year, the year of the Olympic Games (from 1896 to 2000)
 * Time, winning time (in seconds)

Dataset

 * olympics.xls
 * an SPSS version of the dataset is available on your class website: olympics.sav

Open the dataset in the SPSS data editor.

The following instructions are based on the student version of PASW (SPSS) version 18.

Create a scatterplot of year vs. time
Create a scatterplot (instructions) of year and winning time and determine if the relationship can be considered linear.

If so, then the least squares regression line is a useful tool to help us further describe the relationship between 1,500 meter winning time and year. Continue with the instructions which follow to calculate the least squares regression line and add it to the scatterplot

Plot the least squares regression line
Double-click the graph displayed in the output window to open the Chart Editor.

To add the least squares regression line:
 * Select Elements > Fit Line at Total.

The Properties dialog box opens, with the Fit Line tab highlighted.
 * Confirm that Linear is chosen.

The line is automatically added to the graph. Close the Chart Editor window. The regression line displays among the data points along with the R2 value.

In Version 18, SPSS does not offer an option to add the equation for the line to the graph. Rather we must obtain the equation from regression analysis. Given this, the equation is provided below.

The scatterplot shows one obvious point which sits well outside the other data points. Using the graph we can determine that this is the winning time for the 1896 race. Let's explore how the least squares regression would be effected if this point were removed.

Remove the outlier (1896 winning time) from the plot and calculations
To remove a data point, we can simply delete it from the dataset.
 * Observe that row 1 contains the 1896 winning time.
 * Click on the row header 1 to select the entire row of data.
 * Choose Edit > Cut.

The 1896 data is removed from the dataset.
 * Create a new scatterplot, without the outlier.
 * Add title and the least squares regression line.

Note that the R2 value has changed; it is larger because, without the 1896 value, the line is a better fit to the data.