2. Computations and Data Manipulations

Data Transformations

1. The following functions compute elementary numerical results on a vector x. They could just as easily have been used on a scalar, matrix, or any other numerical object.

ceiling
floor
trunc
round
signif
print
> x_c(-1.90691,0.76018,-0.26556,-1.89828,0.08571,NA)
        * NA means "not available", or missing
        * normally, the result of operating on an NA is another NA

> ceiling(x)
[1] -1  1  0 -1  1 NA   * in this case, where x is a vector, the
                          function is applied to each element in x
                        * the ceiling() function rounds up to the
                          next integer

> floor(x)
[1] -2  0 -1 -2  0 NA   * floor() rounds down to the next integer

> trunc(x)
[1] -1  0  0 -1  0 NA   * trunc() returns only the integer part of
                          the elements of x

> round(x)
[1] -2  1  0 -2  0 NA   * round() rounds values to the nearest
                          integer value
                        * values of n.5 are rounded to the nearest
                          even integer

> round(x,1)
[1] -1.9 0.8 -0.3 -1.9 0.1 NA   * optionally, round() can take a
                                  second argument which specifies
                                  the number of decimal places to
                                  round to

> signif(x,2)
[1] -1.900 0.760 -0.270 -1.900 0.086 NA * signif() rounds data to
                                          the specified number of
                                          significant digits
                                        * all the numbers are
                                          printed to the same
                                          format
                                        * 0's are added where
                                          necessary

> print(x,digits=1)
[1] -1.91 0.76 -0.27 -1.90 0.09 NA  * the print() function with the
                                      optional argument digits prints
                                      the numeric object x to the
                                      specified number of
                                      significant digits
                                    * all the elements of x are then
                                      printed to the same format
2. The following functions are used on vectors. In some cases, they may also be used on matrices, but you may not always get the result you expect.

sum
prod
cumsum
cumprod
diff
> x_(1:5)

> sum(x)
[1] 15                 * sum() calculates the sum of all the values
                         in x

> prod(x)
[1] 120                * prod() calculates the product of all the
                         elements in x

> cumsum(x)
[1] 1 3 6 10 15        * cumsum() returns an object with each
                         element the sum of all the elements in
                         x up to that point
                       * if x is a matrix, cumsum() will find the
                         cumulative sums columnwise

> cumprod(x)
[1] 1  2  6  24  120   * cumprod() returns an object with each
                         element the product of all the elements
                         in x up to that point
> x_c(1,4,8,2,1)

> diff(x)
[1] 3 4 -6 -1     * returns an object where the ith element is equal
                    to               x[i+1]-x[i]
                  * when x is a matrix, the function diff()
                    calculates the  differences separately
                    for each column

> diff(x, lag=2)
[1] 7 -2 -7       * optionally, the argument lag can be specified
                    such that the ith element is equal to
                                   x[i+lag]-x[i]
                    eg.: in this case 8-1, 2-4, 1-8
                  * the diff() functions returns a vector
                    with length(x)-lag elements

Arithmetic Operators

   +     Addition
   -     Subtraction
   *     Multiplication  (performs elementwise multiplication on a matrix)
   /     Division
   ^     Exponential    x^2 == x*x
                                x^(1/3) == the cube root of x
   %/%   Integer divide   e1%/%e2 == floor(e1/e2)
   %%    Modulo function e1%%e2  == e1 - (e1%/%e2)*e2
The usual arithmetic operators work as one would expect: if x is a numeric object, then x*2 multiplies each element of x by 2.

> x_c(-24,-99,82,15)
> y_c(2,3)

> x/y
[1] -12 -33 41  5    * when one argument is longer than the other,
                       the shorter argument is used cyclically,
                       if necessary

> x%/%10 
[1] -3 -10  8  1     * this is equivalent to floor(x/10)
                     * returns 0 when dividing by 0
                     * when x is a positive number, 
                       %/% returns the integer part of /

> x%%10
[1] 6 1 2 5          * this is equivalent to x-10*(x%/%10)
                       (ie.:  the remainder of %/%)  
                     * returns x when dividing by 0
- when x is a positive number, the integer divide and modulo functions can be used to break the number up into the digits that make it up, as was the case for the last two elements of x

> x_765                 * more concisely:
> x1_x %/% 100                               x1_x %/% 100
                                             
> x1                                         x2_(x %% 100) %/% 10
[1] 7
                                             x3_(x %% 100) %% 10
> x_x %% 100
> x
[1] 65

> x2_x %/% 10

> x2
[1] 6

> x3_x %% 10

> x3
[1] 5

Numerical Transformations

  Name                      Operation

  sqrt                       square root
  abs                        absolute value
  sin   cos   tan            trigonometric functions (radians)
  asin  acos  atan           inverse trigonometric functions
  sinh  cosh  tanh           hyperbolic functions
  asinh acosh atanh          inverse hyperbolic functions
  exp   log                  exponential and natural logarithm
  log10                      common logarithm
  gamma lgamma               gamma function and its natural log
- gamma(x) = (x-1)! when x is a positive integer
- use the argument base= to change the base of the natural log function
ie.: log(x,base=10) is the same as log10(x)

Matrix Operations

The following functions apply specifically to matrices (on first reading, ignore all but t(), solve(), and %*%):


  Name               Usage              Operation

   t                   t(A)               transpose
   %*%                 A%*%B              matrix multiply
   crossprod           crossprod(A,B)     cross product
   outer               outer(A,B)         outer product
   svd                 svd(A)             singular value decomposition
   qr                  qr(A)              QR decomposition
   solve               solve(A,B)         solve equations or invert matrices
   eigen               eigen(A)           eigenvalues
   chol                chol(A)            Choleski decomposition
- crossprod(A,B) is equivalent to t(A) %*% B

- crossprod(A) is equivalent to crossprod(A,A)

- the functions outer(), svd(), qr(), eigen(), and chol() can all take optional arguments, these are described in the help documentation

- solve(A,B) finds the solution to the system of equations A %*% X = B

- solve(A) finds the inverse of A
     > square_matrix(c(1,2,3,4,5,6,7,8,9),nrow=3)
     > square
           [,1] [,2] [,3]
     [1,]    1    4    7
     [2,]    2    5    8
     [3,]    3    6    9

     > decomp_eigen(square)

     > decomp$values
     [1]  1.611684e+01 -1.116844e+00 -1.652234e-16

     > decomp$vectors
                [,1]       [,2]       [,3]
     [1,] -0.5598757  0.8251730 -0.3767961
     [2,] -0.6879268  0.2238583  0.7535922
     [3,] -0.8159780 -0.3774565 -0.3767961
* eigen() creates an object of mode list with two components: $values and $vectors

* alternatively, eigenvalues could have been obtained using eigen(square)$values, and eigenvectors using eigen(square)$vectors

Recall that the matrix size used in part 1 is:

     Weight Waist heights
[1,]    130    26     140
[2,]    110    24     155
[3,]    118    25     142
[4,]    112    25     175
[5,]    128    26     170
The function apply() can be used to find the mean for each column in size:

> colmean_apply(size,2,mean)
> colmean               * the first argument gives the name of
 Weight Waist heights     the matrix to which the function will
  119.6  25.2   156.4     be applied
                        * the second argument gives the dimensions
                          over which the function is to be applied
                          - in the case of a matrix, 1 indicates
                          rows, 2 indicates columns
                        * the third argument gives the name of the
                          function to be applied; functions other
                          than mean can be specified here

> sweep(size,2,colmean)
     Weight Waist heights    * sweep "sweeps out" the column means
[1,]   10.4   0.8   -16.4      from the matrix size
[2,]   -9.6  -1.2    -1.4    * the first two arguments in sweep are
[3,]   -1.6  -0.2   -14.4      the same as in apply()
[4,]   -7.6  -0.2    18.6    * the third argument is a vector
[5,]    8.4   0.8    13.6      containing the values to be
                               "swept out" of the matrix

> sweep(size,1,c(1,2,3,4,5),"+")
     Weight Waist heights    * by default, sweep subtracts the
[1,]    131    27     141      values in the third argument from
[2,]    112    26     157      the rows or columns of the matrix
[3,]    121    28     145    * this can be changed by specifying the
[4,]    116    29     179      function in the fourth argument
[5,]    133    31     175    * in this example, 1 is added to the
                               first row, 2 to the second row, etc.

Data Manipulations

rep
seq
rev
> rep(c(4,2),times=2)
[1] 4 2 4 2              * the rep() function replicates input either
                           a certain number of times or to a certain
                           length

> rep(c(4,2),times=c(2,1))
[1] 4 4 2                * when times is a single value,
                           then the first argument is repeated that
                           many times
                         * when times is a vector, then each element
                           in the first argument is matched with a number
                           of times in the second argument

> rep(c(4,2),length=3)
[1] 4 2 4                * when the length argument is specified, the
                           first argument is replicated to produce a
                           vector of the length specified

> seq(1,7,by=2)
[1] 1 3 5 7              * seq() creates a sequence from a to b
                           in steps specified in by
                           (the default is by=1)

> seq(1,-1,by=-0.5)
[1]  1.0  0.5  0.0 -0.5 -1.0

> seq(1,7,length=3)
[1] 1 4 7                * as with rep(), the length of the outcome
                           can be specified in seq(), in which case the
                           value for by is inferred

> rev(seq(1,5))
[1] 5 4 3 2 1            * rev() reverses the order of a vector or list
                         * rev() will also work on matrices, but the
                           result will be a vector
unique
sort
rank
order
rle
> x_c(rep(1,3),seq(1,5,by=2),rev(seq(1,5,length=3)),rep(2,3))
> x
[1] 1 1 1 1 3 5 5 3 1 2 2 2

> unique(x)
[1] 1 3 5 2                      * unique() returns the values of the
                                   input without any replications

> sort(x)
[1] 1 1 1 1 1 2 2 2 3 3 5 5
                                 * sort() sorts data in ascending order
                                 * to sort by descending order, use
                                   > rev(sort(x))

> rank(x)
[1] 3.0 3.0 3.0 3.0 9.5 11.5 11.5 9.5 3.0 7.0 7.0 7.0
                                 * rank() returns the ranks of the input
                                 * in case of ties, the average of the
                                   ranks is returned

> order(x)
[1] 1 2 3 4 9 10 11 12 5 8 6 7
                                 * order() returns the indices of the
                                   data in ascending order
                                 * the first element in order(x)
                                   tells you where the lowest value in
                                   x is, the second element tells you
                                   where the second lowest value in x
                                   is, etc.
                                 * sort() is equivalent to
                                   >  x[order(x)]
Recall that the matrix size used in part 1 is:

     Weight Waist heights
[1,]    130    26     140
[2,]    110    24     155
[3,]    118    25     142
[4,]    112    25     175
[5,]    128    26     170

> i_order(size[,1])
> i                          * returns the indices of Weight. in
[1] 2 4 3 5 1                  ascending order

> size[i,]
     Weight Waist heights    * returns the matrix size sorted by Weight
[1,]    110    24     155    * i contains the order of the rows which
[2,]    112    25     175      would sort the first column in increasing
[3,]    118    25     142      order
[4,]    128    26     170    * by putting i in the row subscript, all
[5,]    130    26     140      the rows are printed out such that the
                               first column is in increasing order

> rle(x)
$lengths:                    * computes the length and the value of runs
[1] 4 1 2 1 1 3                of the same value in a vector
$values:                     * here, x is made up of four 1's, one 3,
[1] 1 3 5 3 1 2                two 5's, one 3, one 1, and three 2's

Further Reading

Richard A. Becker, John M. Chambers, Allan R. Wilks, The New S Language. A Programming Environmnent for Data Analysis and Graphics, Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, California, 1988, pp. 15, 35, 41-48, 129-132

Exercises

a) Find the geometric mean of the vector x:

     x = (2,6,9,17,39)
Note: the geometric mean of n values is the n-th root of the product of the n values

Write the expression so that it will find the geometric mean of any vector x.

c) The following are marks for a student on 12 weekly quizzes marked out of 25.

    quiz = 24 22 17 10 12 13 16 19 15 18 22 21
Calculate the change in the student's marks from one quiz to the next.

d) Find the solution to the system of equations:

    3(X1) + 2(X2) + 6(X3) = 44
    5(X1) - 3(X2) + 4(X3) = 18
    6(X1) + 3(X2) - 2(X3) = 14
e) Create a vector containing the sum of each column of the matrix size.

Solutions (Middle mouse button for separate window)

Where to now?

Table of Contents

Logical Operations