10. Model Formulae
Formulae are Splus expressions that state the structural form of a model in terms of
the variables involved. For example, the formula
cholesterol ~ systol + age
tells us that the response variable, cholesterol, is to be modeled by an additive model in
two predictors, systol and age. This model is the same as
cholesterol ~ 1 + systol + age
where the 1 indicates that an intercept is to be included in the model and is included by
default. To exclude the intercept, the model is written as
cholesterol ~ -1 + systol + age
The terms in a formula can be any Splus expression which, when evaluated, can be
interpreted as a variable. For example:
log(cholesterol) ~ systol + age
Expressions appearing in a model formula are interpreted as ordinary Splus expressions
except for the following operators:
- +
- used to separate items in a list of terms to be included in the model
- :
- denotes interaction
- *
- expansion operator for interaction
- eg.: systol*age is equivalent to systol + age + systol:age
- -
- used to delete terms in a model
- eg.: systol*diastol*age - systol:diastol:age deletes the third-order
interaction term
- %in%
- denotes nesting
- eg.: smoke%in%sex where smoke corresponds to the number of
cigarettes smoked, and smoke is nested within sex
- The model morbidity ~ sex + smoke%in%sex would be of the form:
- morbidity = intercept + (beta1)*sex + (alpha1)*sex1*smoke + (alpha2)*sex2*smoke
- where beta1 corresponds to the contrast for sex.
- /
- expansion operator for nesting
- eg.: sex/smoke (sex and then smoke within sex)
is equivalent to sex + smoke%in%sex
- ^
- crosses all the terms to the specified order
- eg.: (sex + smoke + diabetes)^2 is equivalent to
- sex + smoke + diabetes + sex:smoke + sex:diabetes + smoke:diabetes
- poly()
- generates a basis for polynomial regression
- eg.: poly(x,2), poly(x,y,3) where the last argument is the degree of the
polynomial
Model formulas can be saved in the same way as any other Splus object. The formula can
then be reused and/or modified using the update() function. The first argument to the
update() function is any object with a component named call. This can either be a saved
model formula or a model fitted from a model formula. The next argument is a modelling
formula, such as y ~ a + b. A single . on either side of the ~ gets replaced by the left
or right side of the formula in the first argument.
> chol1_chol~systol*bmi*age
> update(chol1, .~. -systol:bmi:age)
chol ~ systol + bmi + age + systol:bmi + systol:age + bmi:age
Further Reading
John M. Chambers, Trevor J.Hastie, Statistical Models in S,
Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, California,
1992, pp. 18-44.
Where to now?
Table of Contents
Linear Models