Lesson 4 Rocking the Residuals Solidify Understanding
Understand and interpret residuals.
Should all bivariate data be modeled with a linear function?
Are there other ways to tell if a linear model is appropriate besides using a correlation coefficient?
Open Up the Math: Launch, Explore, Discuss
The correlation coefficient is not the only tool that statisticians use to analyze whether a line is a good model for the data. They also consider the residuals, the difference between the observed value (the data) and the predicted value (the
Start with some data:
Create a scatterplot and graph the regression line. In this case, the line is
Draw a line from each point to the regression line, like the segments drawn from each point below.
The residuals are the lengths of the segments. How can you calculate the length of each segment to get the residuals?
Generally, if the data point is above the regression line, the residual is positive. If the data point is below the line, the residual is negative. Knowing this, use your plan from problem 1 to create a table of residual values using each data point.
Statisticians like to look at graphs of the residuals to judge their regression lines. Now, you get your chance to do it. Graph the residuals.
Now, that you have constructed a residual plot, think about what the residuals describe, and answer the following problems.
If a residual is large and negative, what does it mean?
What does it mean if a residual is equal to
If someone told you that they estimated a line of best fit for a set of data points and that all of the residuals were positive, what would you say?
If the correlation coefficient for a data set is equal to
Statisticians use residual plots to see if there are patterns in the data that are not predicted by their model. What patterns can you identify in the following residual plots that might indicate that the regression line is not a good model for the data? Based on the residual plot, are there any points that may be considered outliers?
Ready for More?
Use the residual plot in problem 10 to reconstruct the scatterplot on the graph below. The regression line is shown on the graph.
The line given has a positive correlation coefficient. Could the residuals in problem 10 also represent data that has a negative correlation coefficient?
- residuals, residual plot
- Bold terms are new in this lesson.
In this lesson, we learned that a residual shows the difference between the
Use the box plot to answer problems 1 through 4.
What is the five-number summary (min, Q1, median, Q3, max) for this box plot?
How much of the data set is represented in the box?
How much of the data is represented in one of the whiskers of the plot?
Why is the left side of the box smaller than the right side of the box?