08| Conditional Statements

Miles Robertson, 12.24.23 (edited 01.16.24)

Introduction

In many cases, you will want to run code only if a certain condition is met. We use this logic frequently in our lives: if it is raining today, then I will put on a raincoat. If my doorbell rings and I'm expecting a friend, then I will open the door without looking through the peephole. If a child asks me to help them find their mother, I will explain that this is a dog-eat-dog world and that they must learn to fend for themselves. This is incredibly useful in code, and can be accomplished with conditional statements. In this chapter, we will learn about the if, else if, and else statements, and how we can use them in programs.

If-Else Blocks

The if statement is the most basic conditional statement. It allows us to run code only if a certain condition is met. The syntax for an if statement is as follows:

if (condition) {
    # code to run if condition is true
}

The condition is a logical expression that evaluates to either TRUE or FALSE. If the condition is TRUE, then the code inside the curly braces will run. If the condition is FALSE, then it will not. Let's look at an example where a few lines of code are run only if a condition is met:

age <- 18

if (age < 18) {
    cat("You are not old enough to vote.")
}

Here, since age < 18 evaluates to FALSE, the code inside the curly braces will not run. If we change the value of age to 16, then the code will run, since age < 18 is now TRUE.

Say we want the above program to show an affirmation of voting rights for those 18 or older. One option is to add a second if statement as shown below.

age <- 18

if (age < 18) {
    cat("You are not old enough to vote.")
}

if (age >= 18) {
    cat("You can vote!")
}

The code above is effective in achieving what we want. However, in this situation, there is never a case where the value of age is both less than 18 and greater than or equal to 18. In other words, the two conditions are mutually exclusive. In this case, we can use an else statement to make our code more efficient. The else statement is used in conjunction with an if statement, and runs code only if the statement above it is FALSE:

age <- 18

if (age < 18) {
    cat("You are not old enough to vote.")
} else {
    cat("You can vote!")
}

Note that the else statement does not have a condition. It simply runs code if the if statement above it is FALSE. An else statement is not required for all if statements, but it is useful in this case. Here, You can vote! only appears on the screen if the condition age < 18 is checked and seen to be FALSE. If the condition is TRUE, then the code inside the curly braces after if will run, and the else statement will be skipped entirely.

How about if we want to allow for more than two possible outcomes? Perhaps we want to tell the user that they can vote if they are 18 or older, and that they can drink if they are 21 or older. We can use an else if statement to accomplish this:

age <- 18

if (age < 18) {
    cat("You are not old enough to vote or drink.")
} else if (age < 21) {
    cat("You can vote, but you cannot drink.")
} else {
    cat("You can vote and drink.")
}

TODO: test different ages

In the code above, there are if, else if, and else statements. Change the value of age to different values and see what the outcome of the code is. Is there ever a case where multiple statements are run? Why or why not?

As you likely discovered in the box above, there is never a case inside an if-else block, as it is called, where multiple statements are run. This is because coding languages check each condition, one at a time, and run the first one that is TRUE, and then skip the rest. Alternatively, if none are true, then the else statement will run, if present. For this reason, the order of the statements is important. Even though the condition age < 21 is TRUE for age = 17, the else if statement and everything after it is skipped, since the if statement above it is TRUE.

If you don't want to skip the evaluation of another condition, you can use multiple if statements instead of an else if statement, as seen in this example:

age <- 18
is.sunburn.danger <- TRUE

if (age < 18) {
    cat("You are not old enough to vote.")
} else {
    cat("You can vote!")
}

if (is.sunburn.danger) {
    cat("Wear sunscreen if it's sunny on voting day.")
}

Above, you can see a few interesting things:

For one thing, we have two if statements, one after the other. This means that both conditions will be checked, regardless of whether the first one is TRUE or FALSE, since they are two separate if-else blocks.
Additionally, in the second if statement, we check the value of sunburn.danger. You might be confused by the syntax here, and might have expected to see something like if (is.sunburn.danger == TRUE). However, this is not necessary. Since sunburn.danger is a logical value, it is already either TRUE or FALSE. Therefore, we can simply write if (is.sunburn.danger) to check if it is TRUE. This is a common shortcut in coding languages. The name of the variable helps indicate that its contents are logical, so if (is.sunburn.danger) reads quite nicely in addition to being good code.
Finally, the second if statement has no else statement. This shows that the use of else is not required, and you need not use it or else if if it does not apply for your needs.

You are not limited to using only one else if statement. You can use as many as necessary to accomplish your goals. For example, we can add a third condition to the code above to check if the person is old enough to take out a rental car:

age <- 18

if (age < 18) {
    cat("You aren't old enough to vote, drink, or rent a car.")
} else if (age < 21) {
    cat("You can vote, but you can't drink or rent a car.")
} else if (age < 25) {
    cat("You can vote and drink, but you can't rent a car.")
} else {
    cat("You can do basically anything except be president.")
}

In some cases, you may want to check a condition only if another condition is TRUE. For example, if you are under 18, you cannot vote or be drafted, regardless of your gender. However, if you are 18 or older and you are male, then you must sign up for the draft. We can nest if-else blocks to accomplish this:

age <- 18
gender <- "male"

if (age < 18) {
    cat("You can't vote or be drafted.")
} else {
    if (gender == "male") {
        cat("You can vote and must sign up for the draft.")
    } else {
        cat("You can vote but won't be drafted.")
    }
}

Here, you can see that the inner if-else block is only accessed if the outer if statement is FALSE. This kind of nesting can theoretically go on forever, but you should avoid going any further than two levels deep, since it is so hard to keep track of what is happening logically after two levels. There are ways you can work around nesting more than that.

As you have seen, anything that evaluates to TRUE or FALSE can go in the parentheses after if. This includes all the logical operators talked about early on in this tutorial. There are many cases where you will want to check multiple conditions at once, which can be done using & and | (and and or). For example, if you want to check if a person is old enough and tall enough to ride a roller coaster, you might use the following code:

age <- 20
height <- 50

if (age >= 18 & height >= 60) {
    cat("You can ride the roller coaster!")
} else {
    cat("You can't ride the roller coaster.")
}

Example 1: Handling Zeros in Division

Say you are making a function that calculates the percent change between two numbers. Perhaps it could be used to see the change in the density of a species in a certain area between two years. You might write the following code:

getPercentChange <- function (start, end) {
    return (100 * (end - start) / start)
}

This works fine for most cases. If you run getPercentChange(9, 27), you get 200, or a 200% increase. However, if you run getPercentChange(0, 10), the function will try to compute a positive number divided by zero, which is Inf in R. Additionally, if you run getPercentChange(0, 0), then the function will be dividing zero by zero, which is NaN in R. Perhaps you would prefer to have the function return NA if only the start value is zero, and 0 if both values are zero. This can be accomplished with conditional statements.

Before we write the code, let's consider what we want to do conceptually. We want our function to return one of three lines of code (0, NA, or 100 * (end - start) / start), depending on the values of start and end. I will label our three situations for reference: (a) if start is zero and end is zero, then we want to return 0, (b) if start is zero but end is not zero, then we want to return NA, and (c) otherwise, we want to return 100 * (end - start) / start. Here is one way we can write that code:

getPercentChange <- function (start, end) {
    if (start == 0 && end == 0) { # (a)
        return (0)
    } else if (start == 0) {      # (b)
        return (NA)
    } else {                      # (c)
        return (100 * (end - start) / start)
    }
}

You may have noticed that I omitted including end != 0 in the condition for the else if statement, despite the requirements for situation (b). However, you would find that the code behaves identically if you include that requirement in the else if condition. This is because the else if statement is only checked if the if statement above it is FALSE. Therefore, if we have made it past the initial if statement (i.e., if the if condition is evaluated to be FALSE), we know that start is not zero, end is not zero, or both. Therefore, to check the conditions for situation (b), we only need to check if start is zero, because if it is, it is guaranteed that end is not zero. This may be a bit hard to wrap your head around, so take your time to try to understand what is going on here.

There are other ways we might write this code using conditional statements. You will notice that situation (a) and situation (b) both require that start is zero, and therefore, if start is not zero, we know that situation (c) applies. Additionally, if we do find that start is zero, we can check if end is zero to determine whether situation (a) or situation (b) applies. Therefore, the following code behaves identically to the code above:

getPercentChange <- function (start, end) {
    if (start == 0) {
        if (end == 0) { # (a)
            return (0)
        } else {        # (b)
            return (NA)
        }
    } else {            # (c)
        return (100 * (end - start) / start)
    }
}

If you are feeling confused by any of the code above, try to trace through it with a few different values of start and end, pretending that you are R trying to execute the code. For example, say we run getPercentChange(0, 10), so that start is 0 and end is 10. Then, R would first check the logical value of start == 0, which in this case is TRUE. Therefore, it would run the code inside the corresponding curly braces. That means it would check the logical value of end == 0, which in this case is FALSE, so it skips the first if statement and moves on to the else statement. Therefore, the contents of the else statement are run, and NA is returned. Tracing through the code with a few different values of start and end can help you get a better grasp as to why either way of writing the function is effective.

Example 2: Hill Numbers

In the previous chapter, we defined a function to calculate the Shannon Index of a community. However, the Shannon Index is just one of a few diversity indices that are commonly used in ecology. This family of indices are called Hill numbers, and they are defined as follows:

$$\text{Hill number of order } q = \Big( \sum p_i^q \Big)^{1/(1-q)}$$

I will spare you from the mathematical details, but the important things to know are that the Hill number order 0 gives the total number of species in a community, and the Hill number of order 1 is just a slight variation of the Shannon Index. Higher-ordered Hill numbers are also used. Thus, we might code a function to calculate Hill numbers as follows:

hillNumber <- function (proportions, order) {
    return (sum(proportions ^ order) ^ (1 / (1 - order)))
}

However, you will find that this function does not work if order has the value of 1, since we would be dividing by zero. However, some mathematical manipulation can be done to avoid this issue, thus giving the following formula for the Hill number of order 1:

$$\text{Hill number of order } 1 = \exp\Big( -\sum p_i \ln (p_i)\Big)$$

So, the initial function we wrote is fine for all orders except 1. We can use conditional statements to account for this:

hillNumber <- function (proportions, order) {
    if (order == 1) {
        return (exp(-sum(proportions * log(proportions))))
    } else {
        return (sum(proportions ^ order) ^ (1 / (1 - order)))
    }
}

The `ifelse()` function

Here, I will briefly mention a function in R that is very useful when you need to make variables that depend on the values of other variables. For example, you may have a variable called temperature that contains the temperature reading for a person, and you want to make a new variable called health.status that states the person's health status based on their temperature (e.g., "healthy" or "sick"). Here is how you might do that using regular conditional statements as described above:

temperature <- 98.6

if (temperature > 100) {
    has.fever <- "sick"
} else {
    has.fever <- "healthy"
}

However, this can be done in a single line, which is sometimes observed in code. There are two different ways to do this, one of them using the ifelse() function:

temperature <- 98.6

has.fever <- if (temperature > 100) {"sick"} else {"healthy"}
# or equivalently...
has.fever <- ifelse(temperature > 100, "sick", "healthy")

As you can see, these two lines have very similar structure. Both check a condition and return one of two values based on the condition. A normal if statement can be used to return values if it is put into one line. However, it looks strange given that if statements are almost always used over multiple lines. The ifelse() function is a built-in function in R that takes three arguments: the condition to check, the value to return if the condition is TRUE, and the value to return if the condition is FALSE. Between the two, the ifelse() function is more readable, and is more commonly used in this context. It is useful when you want to make a new variable that depends on the values of other variables, and you want to do it in a single line. However, it is not as flexible as regular conditional statements, and after a certain level of complexity a normal multi-line if statement should be used instead. But for simple cases, the ifelse() function is a great tool to be aware of.

Conclusion

Conditional statements are incredibly useful in programming, and will make frequent appearances in your code and the code of others. They allow you to run code only if a certain condition is met, and can be used to make your programs more customizable and useful.

One common use of conditional statements is inside user-defined functions to check the inputs provided by a user of the function. This can help make the functions you write more robust against unexpected behavior. For example, if you write a function that takes a person's age as input, you might want to check that the age is a positive integer. Of course, you might make the assumption that the user will input a positive integer, but if you were to publish your code, you'd be surprised how quickly people will find ways to break it. Therefore, it is good practice to check the input of your functions. The best way to do this is with conditional statements.

In this chapter, the contents of the if statements were all cat() functions. However, you can put any code you want inside the curly braces, including other functions, variable assignments, and more. It is important as you learn concepts like this to think about how you might use these tools in other contexts.

Practice

Answer Comprehension Questions

Think about the following questions and answer them with what you learned in this chapter and through your own experimentation in an R file.

What is the difference between using two if statements versus using an if and else if statement? E.g., given that car.speed = 40, what's the effect of using the else if statement in the second code chunk in place of the second if statement in the first code chunk?

if (car.speed < 50) {
    cat("You should be in the rightmost lane.")
}

if (car.speed < 60) {
    cat("You can be in the second lane from the right.")
}

if (car.speed < 50) {
    cat("You should be in the rightmost lane.")
} else if (car.speed < 60) {
    cat("You can be in the second lane from the right.")
}

Say you have an if statement followed by several else if statements. If the condition for the if statement is TRUE, will the other conditions be checked? If the condition for the if statement is FALSE, but the first else if statement is TRUE, will the remaining conditions be checked?
Generalizing from the previous question, why does the order of the conditions matter in an if-else if block?

Make an Interactive Script

In R, the readline() function allows you to get input from the user in the console. This is useful for making interactive scripts. For example, you can ask the user for their name, and then use that name within the script. Below is an example of how you might use this function:

name <- readline("What's your name? ")
# prints this prompt and lets user type response in the console
cat("Hello, ", name, "!", sep = "")
# the sep argument gives what character to put between 
# arguments when printing

Make a script that asks the user for their name, and then asks them if they are having a good day. Then, a final message should be printed based on their response. This message should include their name, and should be different whether they are having (1) a good day, (2) a bad day, or (3) if they said something you did not understand. Use conditional statements to accomplish this.

In many cases, users will not type in the exact response you are expecting. For example, if you ask the user if they are having a good day, they might type yes, Yes, or YES. In order to account for this, you can use the tolower() function to convert the user's response to lower case within the condition after if, like this:

if (tolower(user.response) == "yes") {
    # code to run if user says yes
}

Write Simple Functions With Conditions

In the practice section of the previous chapter, you wrote several functions that completed the described task. Now, do the same, but use conditional statements where necessary.

Write a function called getDifference that takes two positional arguments and a keyword argument called reversed that defaults to FALSE. If reversed is FALSE, the function should return the difference between the first and second argument. If reversed is TRUE, the function should compute the difference in the opposite order. E.g., passing 5 and 3 with reversed=TRUE should return -2, but using reversed=FALSE or omitting the reversed argument entirely should make the function return 2.
Write a function called printInvite that takes a vector of names called people. This vector could have any length, but will contain names in a character vector, e.g., c("Fiona", "Paul", "Bess"). Generally, the function should return a string that looks like this:
```
"Hey, _____! Come to my birthday party on Saturday!"
```
As long as "Stacey" is not in people, the function should return the string above, with the blank filled in with the names in people. Specifically:
- if there is just one name in people, e.g., c("Ed"), the function should simply insert that name, e.g., "Ed"
- if there are two names in people, e.g., c("Ed", "Sue"), the function should insert the first name, followed by " and ", followed by the second name, e.g., "Ed and Sue"
- if there are more than two names in people, e.g., c("Ed", "Sue", "Kim", "Li"), the function should insert the first name, followed by ", ", followed by the second name, and so forth, until the final name that should be preceded by ", and ", e.g., "Ed, Sue, Kim, and Li"
However, if "Stacey" is indeed in people, the function should return "Stacey is not invited." regardless of the number of names in people.
Write a function called getStatistic that takes a vector of numbers called numbers and a keyword argument called statistic that defaults to "mean". If statistic is "mean", the function should return the mean of numbers. If statistic is "median", the function should return the median of numbers. If statistic is "sd", the function should return the standard deviation of the numbers. If statistic is any other value, the function should return the string "Error: unknown statistic requested".