Data can be typed directly into Splus or read in from a file. Data objects can also be the result of an expression (combination of data objects and constants with operators and functions). Splus objects are stored in the .Data directory and are saved from one session to the next. In order to save a data object, a name must be assigned to it. This is done using the underscore character "_" or the less-than character and a hyphen "<-", with the name of the object on the left, and the values on the right. Alternatively, the symbol "->" can be used with the values on the left and the name of the object on the right. The name must start with a letter and may contain letters, digits, and periods. Splus is case sensitive, x and X refer to two different things. The following are examples of data assignments:
> height_175
* read as "height gets 175"
* assigns the value 175 to the scalar
name height
> person_"Jo"
* character values are inserted in quotes
* if the quotes are omitted, Splus will
look for a data object Jo to assign
to person
> heights_c(160,140,155)
* the function c() "collects" the values
160, 140, and 155 and stores them into
the vector heights
> people_c("Ned","Jill","Pat")
* creates a vector of names
> names(heights)_people
* the names() function assigns names to
the elements of a vector
* the word people is not inserted in
quotes, it refers to the vector people,
and not the word itself
> heights
Ned Jill Pat * typing the name of an object by itself
160 140 155 causes its value to be printed on the
terminal
> heights["Ned"]
Ned * when an object has a names attribute,
160 its elements can be referred to by name
> names(heights)_NULL
* deletes the names attribute of the
vector heights
> object[subscript]
* syntax for subscripting, where object
is the name of the data object and
subscript defines which elements to
extract
* the expression heights["Ned"] above
is an example of extracting data using a
subscript
- Notice that square brackets are used instead of parentheses. Round brackets are used
by functions (ie.: the c() and names() functions). Functions use the arguments provided in
parentheses to perform a task. Subscripts require square brackets. The information provided
in square brackets tells Splus what subset of a data object is being referred to.
> heights[2]
[1] 140 * extracts the second element from heights
* [1] refers to the position of the first
element on the given line - this is very
useful when vectors are several lines
long
> heights[c(2,1,2)]
[1] 140 160 140 * extracts the second, first, and second
elements from heights
* the c() function is used in the
subscript when more than one element
is listed
> heigths[heights < 160]
[1] 140 155 * returns all the values of heights
which are less than 160
* this is NOT equivalent to
> heights[ < 160]
which would return the first 159
elements in the vector (eg.: the
subscript numbers < 160)
> heights[-2]
[1] 160 155 * returns all except the second value in
heights
> heights[1]_162
> heights * assigns the value 162 to the first
[1] 162 140 155 element of heights
* the old object heights has now been
replaced by the new object heights
> heights[4]_135
> heights * appends the value 135 to the vector
[1] 162 140 155 135 heights
> heights_append(heights,height)
> heights * the function append() creates a new
[1] 162 140 155 135 175 vector with the first values the same
as heights and the last value as
height (recall that height
was previously assigned the value 175)
* the function append() binds two objects
into a vector
* the arguments may be vectors, scalars,
or both
* this is equivalend to
> heights_c(heights,height)
> heights.1_append(heights,180,after=2)
[1] 162 140 180 155 135 175 * the argument after specifies
the index of heights after which the
new values are to be inserted
> heights_replace(heights,2,142)
> heights * replaces the second value in heights with
[1] 162 142 155 135 175 the value 142 and stores the new vector
in heights
* the first argument specifies the name of
the data obejct, the second specifies
the indices of the elements to be
replaced, and the third argument
specifies the values the elements are to
be replaced with
* this expression is equivalent to
> heights[2]_142
> heights.2_replace(heights,c(2,4),c(140,142))
> heights.2 * replaces the second and fourth values of
[1] 162 140 155 142 175 heights by the values 140 and 142
and stores the result into heights.2
> numbers_1:5
> numbers * the operator ":" creates a sequence from
[1] 1 2 3 4 5 1 to 5
* the syntax for the sequence operator is
from:to
> heights_heights.2[2:5]
> heights * assigns the last four elements of the
[1] 140 155 142 175 vector heights.2 to the vector heights
> length(heights)
[1] 4 * returns the length (number of elements)
of the object heights
The data objects you have just created are stored in your .Data directory.
To see a list of the data objects (and later on, functions) you have created, type> ls()
To remove an object or function from your .Data directory, use the rm() function. For example, to remove the scalar height, type
> rm(height)
You can also remove more than one data object at a time. To remove the scalar person and the vector numbers, type
> rm(person,numbers)
Assigning a name already used by an Splus function may cause warning messages to appear on the screen:
> c_c(1,2,3)
> d_c(1,2,3)
Warning messages: Looking for object "c" of mode "function", ignored one of mode "numeric"Here, the name c (the "concatenate" function) was assigned to a data object. The problem is solved by reassigning the object to another name and removing the numeric object from the directory:
> b_c
> rm(c)
This will cause the warning message to be printed one last time.
> size.1_matrix(c(130,26,110,24,118,25,112,25),ncol=2)
> size.1 * the function matrix() reads data into a matrix
[,1] [,2] * the number of columns is specified using the
[1,] 130 118 argument ncol= #
[2,] 26 25 * alternatively, the number of rows can be
[3,] 110 112 specified using the argument nrow= # or both
[4,] 24 25 nrow and ncol can be specified
* when neither nrow nor ncol are specified, the
data is read in as a one column matrix
> size.2_matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T)
> size.2 * specifying byrow=T forces Splus to read the
[,1] [,2] data in row by row
[1,] 130 26 * when the argument is not specified, or specified
[2,] 110 24 as byrow=F, Splus assumes the data is
[3,] 118 25 written in column by column
[4,] 112 25
> size.names_list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist"))
> size.names
[[1]]:
[1] "Abe" "Bob" "Carol" "Deb"
[[2]]:
[1] "Weight" "Waist"
Notice the double square brackets: whereas single square brackets are used
to extract data from a vector, double square brackets are used to extract
components from a list:
> size.names[[2]] [1] "Weight" "Waist"The individual components in the list retain their properties as vectors and as such, individual elements can be extracted from each component in the same way as in any other vector:
> size.names[[2]][2] [1] "Waist"Names can also be assigned to the components of a list:
> names(size.names)_c("Rows","Columns")
> size.names
$Rows:
[1] "Abe" "Bob" "Carol" "Deb"
$Columns:
[1] "Weight" "Waist"
The components of the list can then be extracted using their names
attribute:
> size.names$Rows
[1] "Abe" "Bob" "Carol" "Deb"
> size.names$Rows[2]
[1] "Bob"
> dimnames(size.2)_size.names
> size.2
Weight Waist * the dimnames() function assigns names to the
Abe 130 26 dimensions of a data object (in this case,
Bob 110 24 the rows and columns of size.2)
Carol 118 25
Deb 112 25
> size.2_matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T,
+ dimnames=list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist")))
* it is possible to assign dimnames directly from
within the matrix function
* expressions can be spread over several lines,
simply hit return at the end of the line and
Splus prompts for a continuation line by means
of the "+" character (this may also happen if
you omit to close all open brackets or strings)
> dimnames(size.2)_list(NULL,c("Weight","Waist"))
* the NULL object is used when no dimnames are
to be assigned to a dimension
> abc_size.2
> dimnames(abc)_list(c("Abe","Bob","Carol","Deb"),dimnames(size.2)[[2]])
* this command assigns dimnames to the rows of abc
and assigns the column dimnames of size.2 to the
columns of abc
> size_cbind(size.2,heights)
> size * cbind() (column bind) "binds" together
Weight Waist heights vectors and matrices columnwise into a
[1,] 130 26 140 new matrix
[2,] 110 24 155 * cbind() "binds" the vector heights
[3,] 118 25 142 columnwise to the matrix size.2 and
[4,] 112 25 175 stores the resulting matrix in size
* the name heights is automatically
assigned to the third column of the
matrix size
> size_rbind(size,c(128,26,170))
> size * rbind() (row bind) "binds" together
Weight Waist heights vectors and/or matrices rowwise into
[1,] 130 26 140 a new matrix
[2,] 110 24 155
[3,] 118 25 142
[4,] 112 25 175
[5,] 128 26 170
> x_c(1,2,3)
> y_diag(x)
> y * the function diag() creates a matrix with
[,1] [,2] [,3] the vector y on the main diagonal
[1,] 1 0 0 * the main diagonal of a matrix are those
[2,] 0 2 0 elements whose row number and column
[3,] 0 0 3 number are the same
* the number of rows or columns can be
specified using the arguments nrow or ncol
> diag(y)
[1] 1 2 3 * alternatively, when the argument is a
matrix, diag() returns the diagonal of
the matrix
> col(y)
[,1] [,2] [,3] * the function col() returns a matrix of
[1,] 1 2 3 column numbers
[2,] 1 2 3 * similarly, the function row() returns a
[3,] 1 2 3 matrix of row numbers
> size[2,3]
heights * to extract one value from a matrix, it is
155 necessary to use two elements in the
subscript: the first element applies to
the rows, the second element applies to
the columns
* the full subscript expression applies to
the elements of the matrix that satisfy
both the row and the column condition
* in this case, the element in the second row,
third column of the matrix size is printed
> size[2,]
Weight Waist heights * if one dimension is not specified in the
110 24 155 subscript, all elements in that dimension
are extracted
* in this case, the columns are not specified
so all the columns are included
> size[,3]
[1] 140 155 142 175 170 * prints the third column of the matrix size
* in both examples, the comma must be kept in
as a marker to indicate which dimension is
specified
* in both of these examples, Splus drops the
extra dimension so that the result is a
vector
> size[2, ,drop=F]
Weight Waist heights * to retain the matrix properties for the
[1,] 110 24 155 result (which might be necessary in some
computations), add drop=F to the
subscripts
* notice that two commas were used in the
subscript, one to separate the row from the
column (not specified) dimensions, the other
to separate the indices from the argument
drop
> is.matrix(size[2,])
[1] F * is.matrix is a logical expression which
tests whether an object is a matrix
> is.matrix(size[2, ,drop=F)
[1] T * as seen above, when a single row or column
is extracted from a matrix, the matrix
properties are dropped unless otherwise
specified in the argument drop
> size[,c(1,3)]
Weight heights * the c() function is used in matrix
[1,] 130 140 subscripts in the same way as it is used
[2,] 110 155 in vector subscripts
[3,] 118 142 * here, the first and third columns of the
[4,] 112 175 matrix size are printed out
[5,] 128 170
> size[,c("Weight","Waist")]
Weight Waist * character subscripts are used in the same
[1,] 130 26 way as numeric subscripts: the first
[2,] 110 24 element in the subscript specifies the
[3,] 118 25 rows, and the second element in the
[4,] 112 25 subscript specifies the columns
[5,] 128 26
> size[-2,-3]
Weight Waist * negative subscripts have the same meaning
[1,] 130 26 for the rows and columns of matrices that
[2,] 118 25 they have for elements of a vector
[3,] 112 25
[4,] 128 26
Suppose you wished to print the weights of those people taller than 160cm: the
expression size[,1] will print all the weights in the matrix size. It is necessary
to limit the rows to be printed to those rows where the value for heights (column 3)
is greater than 160cm, ie.: those rows which satisfy the condition 'size[,3] > 160'.
Combining these two expressions gives
> size[size[,3] > 160,1]
[1] 112 128 * this command pulls out the weights (column 1)
of those people (rows) with height (size[,3])
greater than 160
> dim(size)
[1] 5 3 * the dim() function returns the dimensions of
an object
* in the case of matrices, the first element
is the number of rows in the matrix and the
second element is the number of columns
> nrow(size)
[1] 5
> ncol(size)
[1] 3 * the functions nrow() and ncol() are based
on the function dim() and return the number
of rows or the number of columns in the
matrix
a) Create the following matrix called marks, and put in the approriate label names.
Test1 Test2 Test3 Final
[1,] 20 23 18 48
[2,] 16 15 18 40
[3,] 25 20 22 40
[4,] 14 19 18 42
b) Add the following row to the bottom of the matrix:
10 15 14 30
c) Change the fifth mark for test #2 from a 15 to a 17.d) Print all the marks for test #3.
e) Print the final marks for those people with marks greater than 16 on test #1.
f) Print the marks matrix without the column for test #3.
g) Print the number of rows in the matrix.
Solutions(Middle mouse button for separate window)
Computations and Data Manipulations