where formula is the structural formula that specifies the model and data is the data frame containing the data (omitted if the data is not stored in a data frame). The following example uses the dataframe car.test.frame described in the section on data frames.
> fuel.fit_lm(100/Mileage ~ Weight + Disp., car.test.frame) > fuel.fit Call: lm(formula=100/Mileage~Weight + Disp.,data=car.test.frame) Coefficients: (Intercept) Weight Disp. 0.4789733 0.001241421 0.0008543589 Degrees of freedom: 60 total; 57 residual Residual standard error: 0.3900812A list of the components of an lm object and of the arguments which can be passed through the lm() function can be found at the end of this section.
The lm object fuel.fit can be used as an argument to other functions to obtain summaries of the data or to extract specific information about the model. The function summary() gives a more detailed description of the model than the one obtained by printing out the lm object. The components of an lm.summary object are listed at the end of this section.
> fuel.summary_summary(fuel.fit) > fuel.summary Call: lm(formula=100/Mileage~Weight + Disp.,data = car.test.frame) Residuals: Min 1Q Median 3Q Max -0.8109 -0.2559 0.01971 0.2673 0.9812 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.4790 0.3418 1.4014 0.1665 Weight 0.0012 0.0002 7.2195 0.0000 Disp. 0.0009 0.0016 0.5427 0.5895 Residual standard error: 0.3901 on 57 degrees of freedom Multiple R-Squared: 0.7438 F-statistic: 82.75 on 2 and 57 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) Weight Weight -0.8968 Disp. 0.4720 -0.8033Specific information about the model can be printed using the functions coef(), resid() and fitted(). The same information can be obtained using fuel.fit$coef, fuel.fit$resid, and fuel.fit$fitted.
> coef(fuel.fit) (Intercept) Weight Disp. 0.4789733 0.001241421 0.0008543589 > formula(fuel.fit2) 100/Mileage ~ Weight + Disp. + Type * the formula() function extracts the model formula from the lm object fuel.fit2The update() function is used to modify the model formula used in fuel.fit:
> fuel.fit2_lm(update(fuel.fit, . ~ . + Type)) > fuel.fit2 Call: lm(formula = update(fuel.fit, . ~ . + Type)) Coefficients: (Intercept) Weight Disp. Type1 Type2 2.960617 4.16396e-05 0.007594073 -0.1452804 0.09808411 Type3 Type4 Type5 -0.1240942 -0.04358047 0.1915062 Degrees of freedom: 60 total; 52 residual Residual standard error: 0.3139246By default, contrasts used for factors are the Helmert parametrization. Here, the coefficient for Type 1 corresponds to the contrast between the first two factors, the coefficient for Type 2 corresponds to the contrast between the third factor and the mean of the first two, etc. To view the contrast matrix, type fuel.fits2$contrasts. The type of contrast used can be changed by specifying a different method as an argument to the options() function, or bychoosing a contrast in the model formula using the C() function. The four choices are: helmert, poly, sum, and treatment.
> fuel.fit2_lm(update(fuel.fit, . ~ . + C(Type, sum)))
* the first argument to the C() function specifies the factor and the second argument specifies the contrasts. > X11();par(mfrow=c(2,1));plot(fuel.fit) * the plot() function has a default method for lm objects * two plots are produced: one of the response against the fitted values, with the y=x line superimposed, the other of the absolute value of the residuals against their fitted values View plotsSuppose we have another data frame called new.cars for which we want to predict fuel consumption using the model from fuel.fit. This is provided by the function predict().
> predict(fuel.fit, new.cars) * all the variables used in the model fuel.fit must also be in the data frame new.cars
Quantity Expression Meaning Standardized e/(s*(1-h)^.5) Residuals with equal variance Residuals Studentized e/si*(1-h)^.5) Use si as standard error Residuals DFBETAS bi/(si %o% xxi^.5) The change in the coefficients, scaled by the standard error for the coefficients DFFIT h*e/(1-h) The change in the fitted value when that observation is dropped DFFITS h^.5*e/(si*(1-h)) Change in fitted values, standardized to variance 1where the objects are extracted from an lm object (lmf) as described below:
> lms_summary(lmf) > lmi_lm.influence(lmf) > e_residuals(lmf) > s_lms$sigma > si_lmi$sigma > xxi_diag(lms$cov.unscaled) > h_lmi$hat > bi_coef(lmf) - coef(lmi)Some of these quantities can be obtained from the function ls.diag(). The function ls.diag() takes as its argument a list like the output of lsfit(). The function lsfit() is similar to the lm() function except that the formula is not specified using a formula object, but rather the model and response matrices are passed on as arguments to the function. These matrices can be created from the variables or extracted from an lm object.
> fuel.model_lm(fuel.fit,x=T,y=T) * an lm object is constructed which saves the model matrix and the response > fuel.ls_lsfit(fuel.model$x, fuel.model$y, intercept=F) * the model matrix and the response are then passed on to the lsfit() function * the argument intercept=F is specified because the intercept is already included in the model matrix * weights can also be specified using the argument wt= > ls.print(fuel.ls) * ls.print() is similar to summary() Residual Standard Error = 0.3901, Multiple R-Square = 0.7438 N = 60, F-statistic = 82.7518 on 2 and 57 df, p-value = 0 coef std.err t.stat p.value Intercept 0.4790 0.3418 1.4014 0.1665 Weight 0.0012 0.0002 7.2195 0.0000 Disp. 0.0009 0.0016 0.5427 0.5895 > fuel.diag_ls.diag(fuel.ls) * ls.diag() computes regression diagnostics * the function returns a list with components: std.dev residual standard deviation hat vector containing the diagonal of the hat matrix std.res standardized residuals stud.res studentized residuals cooks Cook's distance for each observation dfits change in fitted value when each observation is deleted correlation correlation matrix for the parameter estimates std.err standard errors of the parameter estimates cov.unscaled unscaled covariance matrix for the parameter estimatesAn lm.object is a list with components (eg.: fuel.fit$coefficients):
coefficients residuals fitted.values effects orthogonal, single-degree-of-freedom effects R triangular facor of the decomposition rank the computed rank assign list of assignments of coefficients (and effects) to the terms of the model terms an object summarizing the formula call call that produced the object contrasts a list with matrices or vectors of contrasts used to code factors df.residual number of degrees of freedom for residuals R.assign same as assign, for an over-specified model (in terms of full-rank part of model) assign.residuals vector identifying effects assigned to residuals qr qr decomposition object (optional) model model frame (optional) x model matrix (optional) y response (optional)The following is a list of other possible arguments to the function lm(), with the default action when not specified, if applicable:
weights= * vector of observation weights subset= * expression specifying rows to be used * may be logical, numeric (row numbers), or character (row names) na.action=na.fail * a function to filter missing values * default creates an error message, a possible alternative is na.omit which deletes observations with missing values method="qr" * least squares fitting method to be used * method="model.frame" returns the model.frame model=F * if T returns the model frame component model x=F * if T the model matrix is returned in component x y=F * if T the response is returned in component y contrasts=NULL * a list giving contrasts for all or some of the factors appearing in the model formula * the elements of the list should be a contrast matrix, or a function to compute such a matrixThe function summary computes a list with the following components:
correlation correlation coefficient matrix for the coefficients cov.unscaled unscaled covariance matrix df degrees of freedom for the model and for the residuals coefficients a matrix with three columns: coefficients, standard errors, t-statistics r.squared the multiple R-squared statistic fstatistic numeric vector of length three giving the F test - the first element is the statistic, the last two are the df residuals the model residuals (weighted if applicable) sigma residual standard error estimate terms the terms object used in fitting this model