# 4. Simple Statistics

## Summary Statistics

max
min
range
mean
median
var
cor
quantile
summary
```> x_c(1,2,3,3,3,4,7,8,9,NA)   * when there are missing values in the
data, the functions max(), min(),
range(), mean(), and median() return NA,
and the functions var(), cor(), and
quantile() return an error message

> max(x, na.rm=T)
[1] 9                         * specifying na.rm=T in the function
max() forces Splus to remove any
missing values from the vector x and
to return the maximum value in x

> min(x, na.rm=T)
[1] 1

> range(x, na.rm=T)
[1] 1 9

> mean(x, na.rm=T)
[1]  4.444444

> mean(x, trim=0.2, na.rm=T)
[1]  4.285714                 * the argument trim can take any value
between 0 and 0.5 inclusive to be
trimmed from each end of the ordered
data
* if trim=0.5, the result is the median

> median(x, na.rm=T)
[1] 3

> quantile(x, probs=c(0,0.1,0.9), na.rm=T)
0% 10% 90%                  * the function quantile() returns the
1 1.8 8.2                    quantiles of x specified in the
argument probs
```
If there are no missing values in the vector x, it is not necessary to specify na.rm=T - simply use min(x), max(x), etc.

These functions may also be used on matrices; they will not be applied to the rows or columns individually but rather will find the max, min, etc. of the whole matrix

```> var(x[!is.na(x)])
[1] 8.027778                  * missing values are removed from the vector
x using the subscript !is.na(x)
* specifying two arguments to the var()
function, var(x,y) returns the covariance
between the two arguments
* arguments may be vectors or matrices

> y_c(1,2,3,4,5,6,7,8,9,10)
> cor(x[!is.na(x)],y[!is.na(x)]
[1] 0.9504597                 * because the cor() function requires x
and y to be of the same length, it is
necessary to remove the value of y
corresponding to the missing value in x;
this is done using y[!is.na(x)]

> summary(x)
Min. 1st Qu. Median  Mean 3rd Qu. Max. NA's
1       3      3 4.444       7    9    1

> z_c(5,4,3,2,1,9,8,7,6,5)
> pmax(x,y,z)
[1]  5  4  3  4  5  9  8  8  9 NA
> pmin(x,y,z)
[1]  1  2  3  2  1  4  7  7  6 NA
* pmax() returns the maximum value for each
position in a number of vectors
* likewise, pmin() returns the minimum value
* na.rm=T may also be specified to remove
missing values
```

## Statistical Distributions

```    < dist >        Parameters            Defaults            Distributions

beta          shape1, shape2        -, -                Beta
binom         size,prob             -, -                Binomial
cauchy        location, scale       0, 1                Cauchy
chisq         df                    -                   Chisquare
exp           rate (1/mean  )       1                   Exponential
f             df1, df2              -, -                F
gamma         shape                 -                   GAMMA
geom          prob                  -                   Geometric
hyper         m, n, k               -, -, -             Hypergeometric
lnorm         mean, sd (of log)     0, 1                Lognormal
logis         location, scale       0, 1                Logistic
norm          mean, sd              0, 1                Normal
nrange        size, nevals          -, 200              Normal Range
-, - for rnrange
pois          lambda                -                   Poisson
t             df                    -                   Student's t
unif          min, max              0, 1                Uniform
weibull       shape                 -                   Weibull
wilcox        m, n                  -, -                Wilcoxon
```
For help on the use of the d < dist > (), p < dist > (), q < dist > (), and r < dist > () functions for each of these distributions, use help with the name of the distribution as it appears in the column Distribution, (eg.: help(GAMMA)) with the following exceptions: for logis type help(dlogis), for nrange type help(dnrange), for the F distribution and Student's t distribution, type help.start(gui='motif'), click on Probability Distributions and Random Numbers under the column Categories, then click on F or T in the left-hand column

```> dnorm(0)
[1] 0.3989423                     * returns the density at 0 for the
normal distribution

> X11()
> plot(seq(-3,3,0.1), dnorm(seq(-3,3,0.1)), type="l")

* the d < dist >  () functions can be
used to plot the density function
for each of the above distributions
> pnorm(1.96)
[1] 0.9750021                     * returns the cumulative probability
at 1.96 for the normal distribution

> qnorm(0.9750021)
[1] 1.96                          * returns the 97.5th percentile for
the normal distribution

> rnorm(5)
[1] -0.7160094  0.3953744  1.2587492  0.3022640 -0.4109508
* generates 5 random standard normal
variables

> rexp(5,1/3)
[1] 0.1204068 0.1937435 9.3637550 0.8051347 1.0450249
* this could also have been written as
> rexp(5, rate=1/3)
```