05| Vector Manipulation

Miles Robertson, 12.21.23

Introduction

So far, we have talked about how to make vectors from scratch, but in many cases the vectors you will work with will come from huge data sets. If you want to change something about these long vectors, such as remove every element that is less than 5, it is just not feasible to make these changes by hand. Even if you take an entire week to search through a several-thousand-element-long vector to remove small entries, you will not only be sad, you will make many mistakes. A computer on the other hand can complete the task in a fraction of a second, can do so without error, and has no feelings and therefore is not saddened by tedium. In this chapter, we will talk about how to manipulate vectors in various ways.

Vector Indexing

As mentioned in a previous chapter, elements of a vector can be accessed by indexing, using []. The index of an element is simply its position: the first element is at index 1, the second is at index 2, etc. So, to get the third element of a vector called my.vector, you can use my.vector[3]. Not only is this used to access elements, but also to put new values at specified indices:

my.vector <- c(9, 10, 2, 6, 1)
my.vector[3]      # get the value at index 3
my.vector[3] <- 5 # change element at index 3 to be 5

To get several elements at once, you can put a vector of indices in the brackets. This can also be used to reassign several indices at once.

my.vector[2:4]          # get 2nd, 3rd and 4th element
my.vector[c(1, 3)]      # get 1st and 3rd element
my.vector[c(4, 1)]      # get 4th and 1st element, in that order
my.vector[3:5] <- 2     # replace elements 3 thru 5 with 2s
my.vector[2:5] <- 10:13 # replace elements 2 thru 5 with the numbers 10 thru 13

In some cases, you will want to get the last element or last few elements of a vector. Unfortunately, the way to do this in R is clunky:

my.vector[length(my.vector)] # get last element
my.vector[(length(my.vector)-2):length(my.vector)] # get last 3 elements
tail(my.vector, n=1) # alternative way to get last element
tail(my.vector, n=3) # alternative way to get last 3 elements

As you can see, to get elements from the end of vectors, you either have to use the specific indices of those elements (which works only if you already know the length of the vector), or to use the length() function to find the length of the vector within the brackets. As you can see above, the tail() function can do the same job more succinctly and readably. However, the tail function cannot be used to assign new values to the vector:

my.vector[length(my.vector)] <- 3 # change last element to be 3
tail(my.vector, n=1) <- 3 # DOES NOT WORK

The brackets can also accept negative numbers, which have the effect of returning all the elements of the vector besides the number(s) provided:

my.vector[-1] # returns everything but element at index 1
my.vector[c(-3, -5)] # returns everything but elements at indices 3 and 5

In addition to numbers, a logical vector of the same length as the indexed vector can be placed in the brackets to indicate which corresponding elements should be returned:

my.vector <- c(1, 5, 2, 2, 4)
my.vector[c(TRUE, TRUE, FALSE, FALSE, TRUE)] # returns the vector 1 5 4
my.vector[my.vector != 2] # has the same effect as above line

As seen in the above code snippet, using logical operators (e.g., !=) with vectors will apply the operation element-wise, which means that my.vector != 2 evaluates as c(TRUE, TRUE, FALSE, FALSE, TRUE). This approach to indexing is incredibly common in R.

Making Vectors by Pattern

There are two functions that are frequently used to make large vectors, and both take advantage of patterns to make vector creation easy:

seq(): This function is used for sequence generation. Its first two arguments are to specify the start and end of the sequence. From there, there are two options for how to specify how the sequence should be generated:
```
seq(10, 20, 2)
seq(10, 20, length.out=5)
```
In the first line, the 2 specifies that the sequence should count by 2 for every step, so it produces the vector 10 12 14 16 18 20. In the second line, the length.out=11 specifies that the vector should have 11 equally-spaced elements from 10 to 20, which produces the vector 10.0 12.5 15.0 17.5 20.0. Although this functionality can be achieved by performing arithmetic on integer sequences made with : (as seen in a previous chapter), it is better to use the seq function in most cases to make your code more readable and editable.
rep(): This function simply repeats a given vector a specified number of times, as seen here:
```
rep(1,5)             # creates the vector 1 1 1 1 1
rep(c(5, 6), 3)      # creates the vector 5 6 5 6 5 6
rep(c(5, 6), each=3) # creates the vector 5 5 5 6 6 6
```
As can be seen in the last two lines, you can either repeat the given vector end-to-end, or alternatively, you can use the each argument to specify that each element should be repeated the specified number of times one after the other.

Combining Vectors

If you have two or more vectors and want to combine them into one, you can use the c() function (short for combine) as shown in a previous chapter and in the code below:

vector1 <- c(5, 3, 2, 9)
vector2 <- -1:-3
vector3 <- seq(3, 12, 3)

c(vector1, vector2, vector3, 88)
# makes the vector 5  3  2  9 -1 -2 -3  3  6  9 12 88

The c() function without any arguments returns NULL, and adding NULL as an argument in c() has no effect on the resulting vector:

c(1, 2, NULL, 5) # returns 1 2 5

There are many instances in coding where adding a single entry to the end or beginning of a vector is desirable. This can be achieved as follows:

my.vector <- c(3, 5, 1)
my.vector <- c(my.vector, 8) # adds a 8 to end of vector
my.vector <- c(7, my.vector) # adds a 7 to start of vector

Sorting Vectors

It is frequently of interest to sort vectors, either in ascending or descending order. The two lines of code show how this is achieved with the sort() function:

my.vector <- c(10, 2, 5, 3)
sort(my.vector)                  # returns 2 3 5 10
sort(my.vector, decreasing=TRUE) # returns 10 5 3 2

One related function that is used for more complicated data structures is the order() function. It isn't very helpful for vectors alone, but I will introduce it now. It returns a sequence of indices, in contrast to sort() which returns the list elements. order() gives you the indices of vector entries, from the smallest to the largest entry (or the reverse if decreasing=TRUE):

my.vector <- c(10, 2, 5, 3)
order(my.vector)                  # returns 2 4 3 1
order(my.vector, decreasing=TRUE) # returns 1 3 4 2

In the second line above, you can read 2 4 3 1 as saying "the second entry is smallest, then the fourth entry is next, then the third is next, and finally the first entry is largest". This function is helpful for sorting multiple vectors by the order of one of those vectors, as will be seen in the next chapter.

In some cases, you may want to "shuffle" a vector to be in a random order, perhaps in a simulation as discussed in the previous chapter. The sample() function, when given a single vector and nothing else, performs this operation:

my.vector <- c(10, 2, 5, 3)
set.seed(100) # required for reproducibility
sample(my.vector) # returns 2 5 3 10

Practice

Follow a Series of Instructions

Execute these steps in a new R file. See if your final answer matches what is reported below.

Set the seed with 100.
Make a sequence of numbers, from 90 to 500, counting by 3. Save it as the variable sequence.vector.
Repeat sequence.vector ten times end-to-end. Overwrite sequence.vector to be this new vector.
Add a vector of random normal values (mean 0, sd 1) of the same length as sequence.vector to sequence.vector. This is mathematical addition, not combining vectors with c(). Overwrite sequence.vector to be this new vector.
Remove all values greater than 200 from sequence.vector (see note below).
Shuffle sequence.vector.
Combine the first and last three elements of sequence.vector. Overwrite sequence.vector to be this new vector of six elements.
Find the maximum element of sequence.vector using the max() function.

If you followed all the steps described above, step 8 should result in the number 199.6023. If you did not get this answer, one of the steps was unsuccessfully implemented. Note that all steps besides the first and last require sequence.vector to be overwritten using sequence.vector <- (this is only implied in steps 5 and 6, since indexing and shuffling require you to save results into the vector variable name to successfully perform the described operations).