Lesson 4 Rocking the Residuals Solidify Understanding

Learning Focus

Understand and interpret residuals.

Should all bivariate data be modeled with a linear function?

Are there other ways to tell if a linear model is appropriate besides using a correlation coefficient?

Technology guidance for today’s lesson:

Explore the Residuals of a Data Set: Casio ClassPad Casio fx-9750GIII

Open Up the Math: Launch, Explore, Discuss

The correlation coefficient is not the only tool that statisticians use to analyze whether a line is a good model for the data. They also consider the residuals, the difference between the observed value (the data) and the predicted value (the $y$ -value on the regression line). This sounds a little complicated, but it’s not really. The residuals are just a way of thinking about how far away the actual data is from the regression line.

Start with some data:

$x$	$1$	$2$	$3$	$4$	$5$	$6$
$y$	$10$	$13$	$7$	$22$	$28$	$19$

Create a scatterplot and graph the regression line. In this case, the line is $y = 3 x + 6$ .

Draw a line from each point to the regression line, like the segments drawn from each point below.

1.

The residuals are the lengths of the segments. How can you calculate the length of each segment to get the residuals?

2.

Generally, if the data point is above the regression line, the residual is positive. If the data point is below the line, the residual is negative. Knowing this, use your plan from problem 1 to create a table of residual values using each data point.

3.

Statisticians like to look at graphs of the residuals to judge their regression lines. Now, you get your chance to do it. Graph the residuals.

Now, that you have constructed a residual plot, think about what the residuals describe, and answer the following problems.

4.

If a residual is large and negative, what does it mean?

5.

What does it mean if a residual is equal to $0$ ?

6.

If someone told you that they estimated a line of best fit for a set of data points and that all of the residuals were positive, what would you say?

7.

If the correlation coefficient for a data set is equal to $1$ , what will the residual plot look like?

Statisticians use residual plots to see if there are patterns in the data that are not predicted by their model. What patterns can you identify in the following residual plots that might indicate that the regression line is not a good model for the data? Based on the residual plot, are there any points that may be considered outliers?

8.

9.

10.

11.

Ready for More?

Use the residual plot in problem 10 to reconstruct the scatterplot on the graph below. The regression line is shown on the graph.

The line given has a positive correlation coefficient. Could the residuals in problem 10 also represent data that has a negative correlation coefficient?

Takeaways

Vocabulary

residuals, residual plot
Bold terms are new in this lesson.

Lesson Summary

In this lesson, we learned that a residual shows the difference between the $y$ -value of a data point and the predicted $y$ -value on the regression line. We calculated residual values and used residual plots to evaluate whether a linear model is appropriate for the data.

Retrieval

Use the box plot to answer problems 1 through 4.

1.

What is the five-number summary (min, Q1, median, Q3, max) for this box plot?

2.

How much of the data set is represented in the box?

3.

How much of the data is represented in one of the whiskers of the plot?

4.

Why is the left side of the box smaller than the right side of the box?