# 2. Computations and Data Manipulations

## Data Transformations

1. The following functions compute elementary numerical results on a vector x. They could just as easily have been used on a scalar, matrix, or any other numerical object.

ceiling
floor
trunc
round
signif
print
> x_c(-1.90691,0.76018,-0.26556,-1.89828,0.08571,NA)
```        * NA means "not available", or missing
* normally, the result of operating on an NA is another NA

> ceiling(x)
 -1  1  0 -1  1 NA   * in this case, where x is a vector, the
function is applied to each element in x
* the ceiling() function rounds up to the
next integer

> floor(x)
 -2  0 -1 -2  0 NA   * floor() rounds down to the next integer

> trunc(x)
 -1  0  0 -1  0 NA   * trunc() returns only the integer part of
the elements of x

> round(x)
 -2  1  0 -2  0 NA   * round() rounds values to the nearest
integer value
* values of n.5 are rounded to the nearest
even integer

> round(x,1)
 -1.9 0.8 -0.3 -1.9 0.1 NA   * optionally, round() can take a
second argument which specifies
the number of decimal places to
round to

> signif(x,2)
 -1.900 0.760 -0.270 -1.900 0.086 NA * signif() rounds data to
the specified number of
significant digits
* all the numbers are
printed to the same
format
necessary

> print(x,digits=1)
 -1.91 0.76 -0.27 -1.90 0.09 NA  * the print() function with the
optional argument digits prints
the numeric object x to the
specified number of
significant digits
* all the elements of x are then
printed to the same format
```
2. The following functions are used on vectors. In some cases, they may also be used on matrices, but you may not always get the result you expect.

sum
prod
cumsum
cumprod
diff
> x_(1:5)

```> sum(x)
 15                 * sum() calculates the sum of all the values
in x

> prod(x)
 120                * prod() calculates the product of all the
elements in x

> cumsum(x)
 1 3 6 10 15        * cumsum() returns an object with each
element the sum of all the elements in
x up to that point
* if x is a matrix, cumsum() will find the
cumulative sums columnwise

> cumprod(x)
 1  2  6  24  120   * cumprod() returns an object with each
element the product of all the elements
in x up to that point
```
> x_c(1,4,8,2,1)

```> diff(x)
 3 4 -6 -1     * returns an object where the ith element is equal
to               x[i+1]-x[i]
* when x is a matrix, the function diff()
calculates the  differences separately
for each column

> diff(x, lag=2)
 7 -2 -7       * optionally, the argument lag can be specified
such that the ith element is equal to
x[i+lag]-x[i]
eg.: in this case 8-1, 2-4, 1-8
* the diff() functions returns a vector
with length(x)-lag elements
```

## Arithmetic Operators

```   +     Addition
-     Subtraction
*     Multiplication  (performs elementwise multiplication on a matrix)
/     Division
^     Exponential    x^2 == x*x
x^(1/3) == the cube root of x
%/%   Integer divide   e1%/%e2 == floor(e1/e2)
%%    Modulo function e1%%e2  == e1 - (e1%/%e2)*e2
```
The usual arithmetic operators work as one would expect: if x is a numeric object, then x*2 multiplies each element of x by 2.

```> x_c(-24,-99,82,15)
> y_c(2,3)

> x/y
 -12 -33 41  5    * when one argument is longer than the other,
the shorter argument is used cyclically,
if necessary

> x%/%10
 -3 -10  8  1     * this is equivalent to floor(x/10)
* returns 0 when dividing by 0
* when x is a positive number,
%/% returns the integer part of /

> x%%10
 6 1 2 5          * this is equivalent to x-10*(x%/%10)
(ie.:  the remainder of %/%)
* returns x when dividing by 0
```
- when x is a positive number, the integer divide and modulo functions can be used to break the number up into the digits that make it up, as was the case for the last two elements of x

```> x_765                 * more concisely:
> x1_x %/% 100                               x1_x %/% 100

> x1                                         x2_(x %% 100) %/% 10
 7
x3_(x %% 100) %% 10
> x_x %% 100
> x
 65

> x2_x %/% 10

> x2
 6

> x3_x %% 10

> x3
 5
```

## Numerical Transformations

```  Name                      Operation

sqrt                       square root
abs                        absolute value
sin   cos   tan            trigonometric functions (radians)
asin  acos  atan           inverse trigonometric functions
sinh  cosh  tanh           hyperbolic functions
asinh acosh atanh          inverse hyperbolic functions
exp   log                  exponential and natural logarithm
log10                      common logarithm
gamma lgamma               gamma function and its natural log
```
- gamma(x) = (x-1)! when x is a positive integer
- use the argument base= to change the base of the natural log function
ie.: log(x,base=10) is the same as log10(x)

## Matrix Operations

The following functions apply specifically to matrices (on first reading, ignore all but t(), solve(), and %*%):

```
Name               Usage              Operation

t                   t(A)               transpose
%*%                 A%*%B              matrix multiply
crossprod           crossprod(A,B)     cross product
outer               outer(A,B)         outer product
svd                 svd(A)             singular value decomposition
qr                  qr(A)              QR decomposition
solve               solve(A,B)         solve equations or invert matrices
eigen               eigen(A)           eigenvalues
chol                chol(A)            Choleski decomposition
```
- crossprod(A,B) is equivalent to t(A) %*% B

- crossprod(A) is equivalent to crossprod(A,A)

- the functions outer(), svd(), qr(), eigen(), and chol() can all take optional arguments, these are described in the help documentation

- solve(A,B) finds the solution to the system of equations A %*% X = B

- solve(A) finds the inverse of A
```     > square_matrix(c(1,2,3,4,5,6,7,8,9),nrow=3)
> square
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

> decomp_eigen(square)

> decomp\$values
  1.611684e+01 -1.116844e+00 -1.652234e-16

> decomp\$vectors
[,1]       [,2]       [,3]
[1,] -0.5598757  0.8251730 -0.3767961
[2,] -0.6879268  0.2238583  0.7535922
[3,] -0.8159780 -0.3774565 -0.3767961
```
* eigen() creates an object of mode list with two components: \$values and \$vectors

* alternatively, eigenvalues could have been obtained using eigen(square)\$values, and eigenvectors using eigen(square)\$vectors

Recall that the matrix size used in part 1 is:

```     Weight Waist heights
[1,]    130    26     140
[2,]    110    24     155
[3,]    118    25     142
[4,]    112    25     175
[5,]    128    26     170
```
The function apply() can be used to find the mean for each column in size:

```> colmean_apply(size,2,mean)
> colmean               * the first argument gives the name of
Weight Waist heights     the matrix to which the function will
119.6  25.2   156.4     be applied
* the second argument gives the dimensions
over which the function is to be applied
- in the case of a matrix, 1 indicates
rows, 2 indicates columns
* the third argument gives the name of the
function to be applied; functions other
than mean can be specified here

> sweep(size,2,colmean)
Weight Waist heights    * sweep "sweeps out" the column means
[1,]   10.4   0.8   -16.4      from the matrix size
[2,]   -9.6  -1.2    -1.4    * the first two arguments in sweep are
[3,]   -1.6  -0.2   -14.4      the same as in apply()
[4,]   -7.6  -0.2    18.6    * the third argument is a vector
[5,]    8.4   0.8    13.6      containing the values to be
"swept out" of the matrix

> sweep(size,1,c(1,2,3,4,5),"+")
Weight Waist heights    * by default, sweep subtracts the
[1,]    131    27     141      values in the third argument from
[2,]    112    26     157      the rows or columns of the matrix
[3,]    121    28     145    * this can be changed by specifying the
[4,]    116    29     179      function in the fourth argument
[5,]    133    31     175    * in this example, 1 is added to the
first row, 2 to the second row, etc.
```

## Data Manipulations

rep
seq
rev
```> rep(c(4,2),times=2)
 4 2 4 2              * the rep() function replicates input either
a certain number of times or to a certain
length

> rep(c(4,2),times=c(2,1))
 4 4 2                * when times is a single value,
then the first argument is repeated that
many times
* when times is a vector, then each element
in the first argument is matched with a number
of times in the second argument

> rep(c(4,2),length=3)
 4 2 4                * when the length argument is specified, the
first argument is replicated to produce a
vector of the length specified

> seq(1,7,by=2)
 1 3 5 7              * seq() creates a sequence from a to b
in steps specified in by
(the default is by=1)

> seq(1,-1,by=-0.5)
  1.0  0.5  0.0 -0.5 -1.0

> seq(1,7,length=3)
 1 4 7                * as with rep(), the length of the outcome
can be specified in seq(), in which case the
value for by is inferred

> rev(seq(1,5))
 5 4 3 2 1            * rev() reverses the order of a vector or list
* rev() will also work on matrices, but the
result will be a vector
```
unique
sort
rank
order
rle
```> x_c(rep(1,3),seq(1,5,by=2),rev(seq(1,5,length=3)),rep(2,3))
> x
 1 1 1 1 3 5 5 3 1 2 2 2

> unique(x)
 1 3 5 2                      * unique() returns the values of the
input without any replications

> sort(x)
 1 1 1 1 1 2 2 2 3 3 5 5
* sort() sorts data in ascending order
* to sort by descending order, use
> rev(sort(x))

> rank(x)
 3.0 3.0 3.0 3.0 9.5 11.5 11.5 9.5 3.0 7.0 7.0 7.0
* rank() returns the ranks of the input
* in case of ties, the average of the
ranks is returned

> order(x)
 1 2 3 4 9 10 11 12 5 8 6 7
* order() returns the indices of the
data in ascending order
* the first element in order(x)
tells you where the lowest value in
x is, the second element tells you
where the second lowest value in x
is, etc.
* sort() is equivalent to
>  x[order(x)]
```
Recall that the matrix size used in part 1 is:

```     Weight Waist heights
[1,]    130    26     140
[2,]    110    24     155
[3,]    118    25     142
[4,]    112    25     175
[5,]    128    26     170

> i_order(size[,1])
> i                          * returns the indices of Weight. in
 2 4 3 5 1                  ascending order

> size[i,]
Weight Waist heights    * returns the matrix size sorted by Weight
[1,]    110    24     155    * i contains the order of the rows which
[2,]    112    25     175      would sort the first column in increasing
[3,]    118    25     142      order
[4,]    128    26     170    * by putting i in the row subscript, all
[5,]    130    26     140      the rows are printed out such that the
first column is in increasing order

> rle(x)
\$lengths:                    * computes the length and the value of runs
 4 1 2 1 1 3                of the same value in a vector
\$values:                     * here, x is made up of four 1's, one 3,
 1 3 5 3 1 2                two 5's, one 3, one 1, and three 2's
```

Richard A. Becker, John M. Chambers, Allan R. Wilks, The New S Language. A Programming Environmnent for Data Analysis and Graphics, Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, California, 1988, pp. 15, 35, 41-48, 129-132

### Exercises

a) Find the geometric mean of the vector x:

```     x = (2,6,9,17,39)
```
Note: the geometric mean of n values is the n-th root of the product of the n values

Write the expression so that it will find the geometric mean of any vector x.

c) The following are marks for a student on 12 weekly quizzes marked out of 25.

```    quiz = 24 22 17 10 12 13 16 19 15 18 22 21
```
Calculate the change in the student's marks from one quiz to the next.

d) Find the solution to the system of equations:

```    3(X1) + 2(X2) + 6(X3) = 44
5(X1) - 3(X2) + 4(X3) = 18
6(X1) + 3(X2) - 2(X3) = 14
```
e) Create a vector containing the sum of each column of the matrix size.

Solutions (Middle mouse button for separate window)