# 12. Model Building

In trying to develop a good model, one may be interested in the change in
fit which results from dropping or adding a term to the model. The function
drop1() returns all the fits obtained from dropping one of the terms from the
model:

> **drop1(fuel.fit)**
Single term deletions
Model:
100/Mileage ~ Weight + Disp.
Df Sum of Sq RSS Cp
8.67331 9.58629
Weight 1 7.931031 16.60434 17.21299
Disp. 1 0.044814 8.71812 9.32678

Conversely, one may want to see the effect of adding one term
from each of a possible choice of terms.

> **fit0_lm(100/Mileage~1, car.test.frame)**
* model fitting only the intercept
* investigate one-variable fits by
adding each term to the model
> **add1(fit0, .~ Weight + Disp. + Type)**
Single term additions
Model:
100/Mileage ~ 1
Df Sum of Sq RSS Cp
33.85687 35.00456
Weight 1 25.13875 8.71812 11.01350
Disp. 1 17.25253 16.60434 18.89972
Type 5 24.23960 9.61727 16.50341

In both cases, the order of the terms in the model formula is
unimportant since the effect of each term on the initial model is
evaluated separately. For stepwise selection of models, the function
step() can be used:
>**fuel.step_step(fit0, .~Weight + Disp. + Type, trace=F, direction="both")**

> fuel.step
Call:
lm(formula = 100/Mileage ~ Weight, data = car.test.frame)
Coefficients:
(Intercept) Weight
0.3914324 0.00131638
Degrees of freedom: 60 total; 58 residual
Residual standard error (on weighted scale): 0.3877015

By default, the *trace* argument =TRUE. In this case, information is
printed while step() is running. Other arguments allow the user to specify the
stopping criteria or the range of the models examined in the stepwise search.
The anova() function works in two ways: given a single object, anova()
fits a sequence of models by successively adding each of the terms to
the model (from last to first). Given more than one fitted model as arguments,
anova() makes sequential pairwise comparisons in the order the fitted models are
listed.

> **anova(fuel.fit)**

Analysis of Variance Table
Response: 100/Mileage
Terms added sequentially (first to last)
Df Sum of Sq Mean Sq F Value Pr(F)
Weight 1 25.13875 25.13875 165.2090 0.0000000
Disp. 1 0.04481 0.04481 0.2945 0.5894582
Residuals 57 8.67331 0.15216

Notice the difference between anova() and add1(). Here the terms are
added to the model sequentially whereas using the add1() function, each term
is added seperately to the original model.
> fuel.fit2_lm(update(fuel.fit, . ~ . + Type))

> **anova(fuel.fit, fuel.fit2)**

Analysis of Variance Table
Response: 100/Mileage
Terms Resid. Df RSS Test Df Sum of Sq F Value
1 Weight + Disp. 57 8.673309
2 Weight + Disp. + Type 52 5.124531 +Type 5 3.548778 7.202082
Pr(F)
1
2 3.465263e-05

## Further Reading

John M. Chambers, Trevor J.Hastie, *Statistical Models in S*,
Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, California,
1992, pp. 124-129, 210-213, 233-238.

## Where to now?

**
Table of Contents****
Analysis of Variance**