12. Model Building

In trying to develop a good model, one may be interested in the change in fit which results from dropping or adding a term to the model. The function drop1() returns all the fits obtained from dropping one of the terms from the model:

> drop1(fuel.fit)
Single term deletions

Model:
100/Mileage ~ Weight + Disp.
       Df Sum of Sq      RSS       Cp
               8.67331  9.58629
Weight  1  7.931031 16.60434 17.21299
 Disp.  1  0.044814  8.71812  9.32678
Conversely, one may want to see the effect of adding one term from each of a possible choice of terms.

> fit0_lm(100/Mileage~1, car.test.frame)
                                    * model fitting only the intercept
                                    * investigate one-variable fits by
                                      adding each term to the model

> add1(fit0, .~ Weight + Disp. + Type)

Single term additions

Model:
100/Mileage ~ 1
       Df Sum of Sq      RSS       Cp
              33.85687 35.00456
Weight  1  25.13875  8.71812 11.01350
 Disp.  1  17.25253 16.60434 18.89972
  Type  5  24.23960  9.61727 16.50341
In both cases, the order of the terms in the model formula is unimportant since the effect of each term on the initial model is evaluated separately. For stepwise selection of models, the function step() can be used:

>fuel.step_step(fit0, .~Weight + Disp. + Type, trace=F, direction="both")

> fuel.step
Call:
lm(formula = 100/Mileage ~ Weight, data = car.test.frame)

Coefficients:
 (Intercept)     Weight
   0.3914324 0.00131638

Degrees of freedom: 60 total; 58 residual
Residual standard error (on weighted scale): 0.3877015
By default, the trace argument =TRUE. In this case, information is printed while step() is running. Other arguments allow the user to specify the stopping criteria or the range of the models examined in the stepwise search.

The anova() function works in two ways: given a single object, anova() fits a sequence of models by successively adding each of the terms to the model (from last to first). Given more than one fitted model as arguments, anova() makes sequential pairwise comparisons in the order the fitted models are listed.

> anova(fuel.fit)

Analysis of Variance Table

Response: 100/Mileage

Terms added sequentially (first to last)
          Df Sum of Sq  Mean Sq  F Value     Pr(F)
Weight     1  25.13875 25.13875 165.2090 0.0000000
Disp.      1   0.04481  0.04481   0.2945 0.5894582
Residuals 57   8.67331  0.15216
Notice the difference between anova() and add1(). Here the terms are added to the model sequentially whereas using the add1() function, each term is added seperately to the original model.

> fuel.fit2_lm(update(fuel.fit, . ~ . + Type))

> anova(fuel.fit, fuel.fit2)

Analysis of Variance Table

Response: 100/Mileage

                  Terms Resid. Df      RSS  Test Df Sum of Sq  F Value
1        Weight + Disp.        57 8.673309
2 Weight + Disp. + Type        52 5.124531 +Type  5  3.548778 7.202082

         Pr(F)
1
2 3.465263e-05

Further Reading

John M. Chambers, Trevor J.Hastie, Statistical Models in S, Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, California, 1992, pp. 124-129, 210-213, 233-238.

Where to now?

Table of Contents

Analysis of Variance