A model developed for genuine data should be tested for both goodness of fit and a lack of pattern in the residuals. Mathematical models are seldom exact, and the imperfections are of serious concern. The residuals--the differences between actual and predicted values--measure the imperfections of a model. A good model should have residuals that are relatively small and randomly distributed. A correlation coefficient close to 1 indicates errors that are small in size but gives no indication of the nature of the distribution of residuals as the following example indicates.
The following problem leads to Kepler's third law, which states that the square of the period of orbit for each planet is equal to the cube of its semi-major axis, provided the earth's period and semi-major axis are used as the units of time and distance. It links data analysis and algebra and involves students in an interesting use of logarithms. This problem works well as an activity for groups of three or four students using cooperative learning. The solution presented below assumes this arrangement.
Problem. Consider the planet data given in Table 2. According to Kepler's first law, the orbit of each planet is an ellipse with the sun at one focus. Let T be the orbit period (i.e., the time required for one full revolution around the sun), and let a be the length of the orbit's semimajor axis.
A. Plot T versus a. Determine an algebraic model that is appropriate for the data, and justify your choice of model. Be sure to consider residuals in your justification.
B. Plot log T versus log a. Determine and justify a model for this new plot.
C. Compare the numerical, graphical, and algebraic aspects of your two models. How are the models related? Why?
D. Revise your model in part A using the earth's period and semimajor axis as the units of time and distance. State the resulting model in words.
E. Predict Pluto's orbit period given that its semimajor axis is 5.9 billion km.
Table 2. Orbit Periods and Semimajor Axes for Six Planets
|Planet||Period of Revolution (days)||Semimajor Axis (km)|
Part A. Because each x represents a semimajor axis, the points corresponding to the four inner planets are closely clustered, just as the planets themselves are closely clustered in the solar system. The scatterplot suggests that the relationship is nonlinear. Nevertheless, many groups of students will be tempted by the correlation coefficient of r = 0.9924 to conclude the relationship is linear. The least squares line, however, predicts Mercury to have a period of revolution of -363 days and yields a residual plot with a clearly nonrandom pattern.
Students will likely try several other options before deciding that a power function provides the best model, if they reach that conclusion at all. The example shows that mathematical and scientific considerations are essential parts of the modeling process. Students should attempt to use scientific information at the beginning of a modeling process to suggest potential models, or at the end of the process to validate models that they develop. In this case, a power function model with a power of 1.5 yields a nearly perfect correlation of 1, suggesting the data points lie very close to the graph of the power function.
A power of 1.5 is consistent with Kepler's third law. Students can overlay the graph of the regression equation model over the scatter plot to further confirm the choice of models.
Part B. Technology makes it easy for students to obtain the log-log data and its plot. Many students will be surprised at how different this plot is from the previous plot. The points are spaced more uniformly. The relationship appears to be linear. So, students will readily pick a linear regression model. It is hoped, most groups will notice how much easier it is to find this model and that logarithms greatly simplify the relationship between the variables.
Part C. Students should notice that the correlation coefficients match in the models in parts A and B, and the slope of the line in B matches the power in A. To explain these relationships, students will need to sort through the conflicting notation. The variables a, b, x, and y have all been used in multiple ways. The situation with a is the worst. It has been used as a regression coefficient in each model and also represents the semimajor axis. The algebra reveals that the two models are equivalent. It also suggests a connection between the models that most students would not have anticipated, that the logarithm of the coefficient in A is the constant term, or y intercept, in B. Students should be encouraged to support this with numerical evidence.
Part D. Students can change the periods to earth-based time units (i.e., years) by dividing the list of periods by 365.2, the number of days per year. Students need to change the distances to earth-based units, too. This can be done by dividing the list of distances by its earth entry. The earth-based distances obtained are in astronomical units. The power regression option will yield a model that is essentially , which is equivalent to Kepler's equation .
Part E. This may seem anticlimactic because students merely need to substitute Pluto's semimajor axis value into the model in part A. The predictive power of scientific theory, however, should not be discounted. As an illustration, gravitational theory led Lowell to postulate the existence and general location of Pluto 25 years before it was first sighted. Library research is a natural follow-up to this planet problem. For example, students could be asked to check whether the answer found for part E is consistent with information in the literature, to verify that the data for Uranus and Neptune fit the model, or to write a report on Kepler's laws.