Lesson 4 Rocking the Residuals Solidify Understanding

Learning Focus

Understand and interpret residuals.

Should all bivariate data be modeled with a linear function?

Are there other ways to tell if a linear model is appropriate besides using a correlation coefficient?

Technology guidance for today’s lesson:

Open Up the Math: Launch, Explore, Discuss

The correlation coefficient is not the only tool that statisticians use to analyze whether a line is a good model for the data. They also consider the residuals, the difference between the observed value (the data) and the predicted value (the -value on the regression line). This sounds a little complicated, but it’s not really. The residuals are just a way of thinking about how far away the actual data is from the regression line.

Start with some data:

Create a scatterplot and graph the regression line. In this case, the line is .

A scatterplot with the horizontal axis extending from 0 to 7 and the vertical axis extending from 0 to 30. Both the regression line, y=3x 6, and the six points in the table are graphed. Four points are above the line and 2 points x111222333444555666777y101010202020303030000

Draw a line from each point to the regression line, like the segments drawn from each point below.

The second scatterplot is identical to the one just described except a vertical line segment has been drawn from each point to the regression line.x111222333444555666y101010202020303030000


The residuals are the lengths of the segments. How can you calculate the length of each segment to get the residuals?


Generally, if the data point is above the regression line, the residual is positive. If the data point is below the line, the residual is negative. Knowing this, use your plan from problem 1 to create a table of residual values using each data point.


Statisticians like to look at graphs of the residuals to judge their regression lines. Now, you get your chance to do it. Graph the residuals.

A blank coordinate plane with the horizontal axis extending from 0 to 6 and the vertical axis extending from -10 to 10.x111222333444555666y–10–10–10–5–5–5555101010000

Now, that you have constructed a residual plot, think about what the residuals describe, and answer the following problems.


If a residual is large and negative, what does it mean?


What does it mean if a residual is equal to ?


If someone told you that they estimated a line of best fit for a set of data points and that all of the residuals were positive, what would you say?


If the correlation coefficient for a data set is equal to , what will the residual plot look like?

Statisticians use residual plots to see if there are patterns in the data that are not predicted by their model. What patterns can you identify in the following residual plots that might indicate that the regression line is not a good model for the data? Based on the residual plot, are there any points that may be considered outliers?


A scatterplot of 26 plotted points in a coordinate plane x222444666888101010y–10–10–10101010000


A scatterplot of 14 plotted points in a coordinate plane x222444666888101010y–10–10–10101010000


A scatterplot of 10 plotted points in a coordinate plane x222444666888101010121212y–20–20–20–10–10–10101010000


A scatterplot of 27 plotted points in a coordinate plane x222444666888101010y–20–20–20202020404040000

Ready for More?

Use the residual plot in problem 10 to reconstruct the scatterplot on the graph below. The regression line is shown on the graph.

The line given has a positive correlation coefficient. Could the residuals in problem 10 also represent data that has a negative correlation coefficient?

The line has an approximate y-intercept at (0, 3) and ends at approximately (10, 18)x222444666888101010y–20–20–20202020000


a venn diagram that relates correlation coefficient and data from a residual plotWhat we learn about thedata from the correlationcoefficient:Things that both thecorrelation coefficient andthe residuals tell us:What we learn about thedata from a residual plot


Lesson Summary

In this lesson, we learned that a residual shows the difference between the -value of a data point and the predicted -value on the regression line. We calculated residual values and used residual plots to evaluate whether a linear model is appropriate for the data.


Use the box plot to answer problems 1 through 4.

A box plot with the left whisker beginning at 12 and extending to 14 where the left box begins. The left box ends at 15 where the right box begins. The right box ends at 20, where the right whisker begins. The right whisker ends at 23.121212131313141414151515161616171717181818191919202020212121222222232323


What is the five-number summary (min, Q1, median, Q3, max) for this box plot?


How much of the data set is represented in the box?


How much of the data is represented in one of the whiskers of the plot?


Why is the left side of the box smaller than the right side of the box?