So far, we have talked about how to make vectors from scratch, but in many cases the vectors you will work with will come from huge data sets. If you want to change something about these long vectors, such as remove every element that is less than 5, it is just not feasible to make these changes by hand. Even if you take an entire week to search through a several-thousand-element-long vector to remove small entries, you will not only be sad, you will make many mistakes. A computer on the other hand can complete the task in a fraction of a second, can do so without error, and has no feelings and therefore is not saddened by tedium. In this chapter, we will talk about how to manipulate vectors in various ways.
As
mentioned in a previous chapter,
elements of a vector can be accessed by indexing, using []
.
The index of an element is simply its position: the first element is at
index 1, the second is at index 2, etc. So, to get the third element
of a vector called my.vector
, you can use my.vector[3]
.
Not only is this used to access elements, but also to put new values at specified
indices:
my.vector <- c(9, 10, 2, 6, 1)
my.vector[3] # get the value at index 3
my.vector[3] <- 5 # change element at index 3 to be 5
To get several elements at once, you can put a vector of indices in the brackets. This can also be used to reassign several indices at once.
my.vector[2:4] # get 2nd, 3rd and 4th element
my.vector[c(1, 3)] # get 1st and 3rd element
my.vector[c(4, 1)] # get 4th and 1st element, in that order
my.vector[3:5] <- 2 # replace elements 3 thru 5 with 2s
my.vector[2:5] <- 10:13 # replace elements 2 thru 5 with the numbers 10 thru 13
In some cases, you will want to get the last element or last few elements of a vector. Unfortunately, the way to do this in R is clunky:
my.vector[length(my.vector)] # get last element
my.vector[(length(my.vector)-2):length(my.vector)] # get last 3 elements
tail(my.vector, n=1) # alternative way to get last element
tail(my.vector, n=3) # alternative way to get last 3 elements
As you can see, to get elements from the end of vectors, you
either have to use the specific indices of those elements
(which works only if you already know the length of the vector),
or to use the length()
function to find the length
of the vector within the brackets. As you can see above,
the tail()
function can do the same job more succinctly
and readably. However, the tail function cannot be used to assign
new values to the vector:
my.vector[length(my.vector)] <- 3 # change last element to be 3
tail(my.vector, n=1) <- 3 # DOES NOT WORK
The brackets can also accept negative numbers, which have the effect of returning all the elements of the vector besides the number(s) provided:
my.vector[-1] # returns everything but element at index 1
my.vector[c(-3, -5)] # returns everything but elements at indices 3 and 5
In addition to numbers, a logical vector of the same length as the indexed vector can be placed in the brackets to indicate which corresponding elements should be returned:
my.vector <- c(1, 5, 2, 2, 4)
my.vector[c(TRUE, TRUE, FALSE, FALSE, TRUE)] # returns the vector 1 5 4
my.vector[my.vector != 2] # has the same effect as above line
As seen in the above code snippet, using
logical operators (e.g., !=
)
with vectors will apply the operation element-wise, which means that
my.vector != 2
evaluates as c(TRUE, TRUE, FALSE, FALSE, TRUE)
.
This approach to indexing is incredibly common in R.
There are two functions that are frequently used to make large vectors, and both take advantage of patterns to make vector creation easy:
seq()
: This function is used for sequence generation.
Its first two arguments are to specify the start and end of the sequence.
From there, there are two options for how to specify how the sequence should
be generated:
seq(10, 20, 2)
seq(10, 20, length.out=5)
In the first line, the 2
specifies that the sequence should count
by 2 for every step, so it produces the vector 10 12 14 16 18 20
.
In the second line, the length.out=11
specifies that the vector
should have 11 equally-spaced elements from 10 to 20, which produces the vector
10.0 12.5 15.0 17.5 20.0
. Although this functionality can
be achieved by performing arithmetic on integer sequences made with :
(as seen in
a previous chapter), it is better to use the seq
function in most cases
to make your code more readable and editable.
rep()
: This function simply repeats a given vector a specified
number of times, as seen here:
rep(1,5) # creates the vector 1 1 1 1 1
rep(c(5, 6), 3) # creates the vector 5 6 5 6 5 6
rep(c(5, 6), each=3) # creates the vector 5 5 5 6 6 6
As can be seen in the last two lines, you can either repeat the given
vector end-to-end, or alternatively, you can use the each
argument
to specify that each element should be repeated the specified number of times
one after the other.
If you have two or more vectors and want to combine them into one, you can use the
c()
function (short for combine) as shown in a
previous chapter
and in the code below:
vector1 <- c(5, 3, 2, 9)
vector2 <- -1:-3
vector3 <- seq(3, 12, 3)
c(vector1, vector2, vector3, 88)
# makes the vector 5 3 2 9 -1 -2 -3 3 6 9 12 88
The c()
function without any arguments returns NULL
,
and adding NULL
as an argument in c()
has no
effect on the resulting vector:
c(1, 2, NULL, 5) # returns 1 2 5
There are many instances in coding where adding a single entry to the end or beginning of a vector is desirable. This can be achieved as follows:
my.vector <- c(3, 5, 1)
my.vector <- c(my.vector, 8) # adds a 8 to end of vector
my.vector <- c(7, my.vector) # adds a 7 to start of vector
It is frequently of interest to sort vectors, either in ascending or
descending order. The two lines of code show how this is achieved
with the sort()
function:
my.vector <- c(10, 2, 5, 3)
sort(my.vector) # returns 2 3 5 10
sort(my.vector, decreasing=TRUE) # returns 10 5 3 2
One related function that is used for more complicated data structures
is the order()
function. It isn't very helpful for
vectors alone, but I will introduce it now. It returns a sequence of
indices, in contrast to sort()
which returns the list elements.
order()
gives you the indices of vector entries, from the smallest
to the largest entry (or the reverse if decreasing=TRUE
):
my.vector <- c(10, 2, 5, 3)
order(my.vector) # returns 2 4 3 1
order(my.vector, decreasing=TRUE) # returns 1 3 4 2
In the second line above, you can read 2 4 3 1
as saying
"the second entry is smallest, then the fourth entry is next, then the third is next,
and finally the first entry is largest". This function is helpful for sorting
multiple vectors by the order of one of those vectors, as will be seen in the
next chapter.
In some cases, you may want to "shuffle" a vector to be in a random order,
perhaps in a simulation as discussed in the previous chapter. The sample()
function, when given a single vector and nothing else, performs this operation:
my.vector <- c(10, 2, 5, 3)
set.seed(100) # required for reproducibility
sample(my.vector) # returns 2 5 3 10
Execute these steps in a new R file. See if your final answer matches what is reported below.
100
.
sequence.vector
.
sequence.vector
ten times end-to-end.
Overwrite sequence.vector
to be this new vector.
sequence.vector
to sequence.vector
.
This is mathematical addition, not combining vectors with c()
.
Overwrite sequence.vector
to be this new vector.
sequence.vector
(see note below).
sequence.vector
.
sequence.vector
.
Overwrite sequence.vector
to be this new vector of
six elements.
sequence.vector
using the
max()
function.
If you followed all the steps described above, step 8 should result in the number
199.6023
. If you did not get this answer, one of the steps was unsuccessfully
implemented. Note that all steps besides the first and last require
sequence.vector
to be overwritten using sequence.vector <-
(this is only implied in steps 5 and 6, since indexing and shuffling require you to save results into
the vector variable name to successfully perform the described operations).