In many cases, you will want to run code only if a certain condition is met.
We use this logic frequently in our lives: if it is raining today,
then I will put on a raincoat. If my doorbell rings and I'm
expecting a friend, then I will open the door without looking through
the peephole. If a child asks me to help them find their mother, I will
explain that this is a dog-eat-dog world and that they must learn to
fend for themselves. This is incredibly useful in code, and can be accomplished
with conditional statements. In this chapter, we will learn
about the if
, else if
, and else
statements, and how we can use them in programs.
The if
statement is the most basic conditional statement. It
allows us to run code only if a certain condition is met. The syntax for
an if
statement is as follows:
if (condition) {
# code to run if condition is true
}
The condition
is a logical expression that evaluates to either
TRUE
or FALSE
. If the condition is TRUE
,
then the code inside the curly braces will run. If the condition is FALSE
,
then it will not. Let's look at an example where a few lines of code are run
only if a condition is met:
age <- 18
if (age < 18) {
cat("You are not old enough to vote.")
}
Here, since age < 18
evaluates to FALSE
, the code
inside the curly braces will not run. If we change the value of age
to 16
, then the code will run, since age < 18
is
now TRUE
.
Say we want the above program to show an affirmation of voting rights
for those 18 or older. One option is to add a second if
statement
as shown below.
age <- 18
if (age < 18) {
cat("You are not old enough to vote.")
}
if (age >= 18) {
cat("You can vote!")
}
The code above is effective in achieving what we want. However, in this situation,
there is never a case where the value of age
is both less than 18
and greater than or equal to 18. In other words, the two conditions are
mutually exclusive. In this case, we can use an else
statement to
make our code more efficient. The else
statement is used in conjunction
with an if
statement, and runs code only if the statement above it is
FALSE
:
age <- 18
if (age < 18) {
cat("You are not old enough to vote.")
} else {
cat("You can vote!")
}
Note that the else
statement does not have a condition. It simply runs
code if the if
statement above it is FALSE
. An
else
statement is not required for all if
statements, but
it is useful in this case. Here, You can vote!
only appears on the screen
if the condition age < 18
is checked and seen to be FALSE
.
If the condition is TRUE
, then the code inside the curly braces after
if
will run, and the else
statement will be skipped entirely.
How about if we want to allow for more than two possible outcomes? Perhaps we want to
tell the user that they can vote if they are 18 or older, and that they can drink if
they are 21 or older. We can use an else if
statement to accomplish this:
age <- 18
if (age < 18) {
cat("You are not old enough to vote or drink.")
} else if (age < 21) {
cat("You can vote, but you cannot drink.")
} else {
cat("You can vote and drink.")
}
In the code above, there are if
, else if
,
and else
statements. Change the value of age
to different values and see what the outcome of the code is. Is there
ever a case where multiple statements are run? Why or why not?
As you likely discovered in the box above, there is never a case inside
an if
-else
block, as it is called, where multiple
statements are run. This is because coding languages check each condition,
one at a time, and run the first one that is TRUE
, and then skip
the rest. Alternatively, if none are true, then the else
statement
will run, if present. For this reason, the order of the statements is important.
Even though the condition age < 21
is TRUE
for
age = 17
, the else if
statement and everything after
it is skipped, since the if
statement above it is TRUE
.
If you don't want to skip the evaluation of another condition, you can use
multiple if
statements instead of an else if
statement,
as seen in this example:
age <- 18
is.sunburn.danger <- TRUE
if (age < 18) {
cat("You are not old enough to vote.")
} else {
cat("You can vote!")
}
if (is.sunburn.danger) {
cat("Wear sunscreen if it's sunny on voting day.")
}
Above, you can see a few interesting things:
if
statements, one after the other. This means that both conditions will be checked,
regardless of whether the first one is TRUE
or FALSE
, since
they are two separate if
-else
blocks.
if
statement, we check the value of sunburn.danger
.
You might be confused by the syntax here, and might have expected to see something
like if (is.sunburn.danger == TRUE)
. However, this is not necessary. Since
sunburn.danger
is a logical value, it is already either TRUE
or FALSE
. Therefore, we can simply write if (is.sunburn.danger)
to check if it is TRUE
. This is a common shortcut in coding languages.
The name of the variable helps indicate that its contents are logical, so
if (is.sunburn.danger)
reads quite nicely in addition to being good code.
if
statement has no else
statement.
This shows that the use of else
is not required, and you need not use it
or else if
if it does not apply for your needs.
You are not limited to using only one else if
statement.
You can use as many as necessary to accomplish your goals. For example,
we can add a third condition to the code above to check if the person
is old enough to take out a rental car:
age <- 18
if (age < 18) {
cat("You aren't old enough to vote, drink, or rent a car.")
} else if (age < 21) {
cat("You can vote, but you can't drink or rent a car.")
} else if (age < 25) {
cat("You can vote and drink, but you can't rent a car.")
} else {
cat("You can do basically anything except be president.")
}
In some cases, you may want to check a condition only if another condition
is TRUE
. For example, if you are under 18, you cannot vote or
be drafted, regardless of your gender. However, if you are 18 or older and
you are male, then you must sign up for the draft. We can nest
if
-else
blocks to accomplish this:
age <- 18
gender <- "male"
if (age < 18) {
cat("You can't vote or be drafted.")
} else {
if (gender == "male") {
cat("You can vote and must sign up for the draft.")
} else {
cat("You can vote but won't be drafted.")
}
}
Here, you can see that the inner if
-else
block
is only accessed if the outer if
statement is FALSE
.
This kind of nesting can theoretically go on forever, but you should avoid going any
further than two levels deep, since it is so hard to keep track of what is happening
logically after two levels. There are ways you can work around nesting more
than that.
As you have seen, anything that evaluates to TRUE
or FALSE
can go in the parentheses after if
. This includes all the
logical operators talked about
early on
in this tutorial. There are many cases where you will want to check multiple
conditions at once, which can be done using &
and |
(and and or). For example, if you want to check if a person
is old enough and tall enough to ride a roller coaster, you might
use the following code:
age <- 20
height <- 50
if (age >= 18 & height >= 60) {
cat("You can ride the roller coaster!")
} else {
cat("You can't ride the roller coaster.")
}
Say you are making a function that calculates the percent change between two numbers. Perhaps it could be used to see the change in the density of a species in a certain area between two years. You might write the following code:
getPercentChange <- function (start, end) {
return (100 * (end - start) / start)
}
This works fine for most cases. If you run getPercentChange(9, 27)
,
you get 200
, or a 200% increase.
However, if you run getPercentChange(0, 10)
, the function will try to compute
a positive number divided by zero, which is Inf
in R. Additionally,
if you run getPercentChange(0, 0)
, then the function will be dividing zero by zero,
which is NaN
in R. Perhaps you would prefer to have the function return
NA
if only the start value is zero, and 0
if both values are zero.
This can be accomplished with conditional statements.
Before we write the code, let's consider what we want to do conceptually. We want our function to
return one of three lines of code (0
, NA
, or
100 * (end - start) / start
),
depending on the values of start
and end
. I will label our three situations
for reference:
(a) if start
is zero and end
is zero, then we want to return 0
,
(b) if start
is zero but end
is not zero, then we want to return NA
, and (c) otherwise, we want to return
100 * (end - start) / start
. Here is one way we can write that code:
getPercentChange <- function (start, end) {
if (start == 0 && end == 0) { # (a)
return (0)
} else if (start == 0) { # (b)
return (NA)
} else { # (c)
return (100 * (end - start) / start)
}
}
You may have noticed that I omitted including end != 0
in the condition for the
else if
statement, despite the requirements for situation (b). However,
you would find that the code behaves identically if you include that requirement in the
else if
condition. This is because the else if
statement is only
checked if the if
statement above it is FALSE
. Therefore, if
we have made it past the initial if
statement (i.e., if the if
condition is evaluated to be FALSE
), we know that
start
is not zero, end
is not zero, or both. Therefore, to check
the conditions for situation (b), we only need to check if start
is zero,
because if it is, it is guaranteed that end
is not zero. This may be a bit hard
to wrap your head around, so take your time to try to understand what is going on here.
There are other ways we might write this code using conditional statements. You will
notice that situation (a) and situation (b) both require that start
is zero,
and therefore, if start
is not zero, we know that situation (c) applies. Additionally,
if we do find that start
is zero, we can check if end
is zero
to determine whether situation (a) or situation (b) applies. Therefore, the following code
behaves identically to the code above:
getPercentChange <- function (start, end) {
if (start == 0) {
if (end == 0) { # (a)
return (0)
} else { # (b)
return (NA)
}
} else { # (c)
return (100 * (end - start) / start)
}
}
If you are feeling confused by any of the code above, try to trace through it with a few different
values of start
and end
, pretending that you are R trying to execute
the code. For example, say we run getPercentChange(0, 10)
, so that start
is 0
and end
is 10
.
Then, R would first check the logical value of start == 0
, which in this case is
TRUE
. Therefore, it would run the code inside the corresponding curly braces.
That means it would check the logical value of end == 0
, which in this case is
FALSE
, so it skips the first if
statement and moves on to the
else
statement. Therefore, the contents of the else
statement are run, and
NA
is returned. Tracing through the code with a few different values of
start
and end
can help you get a better grasp as to why either way of
writing the function is effective.
In the previous chapter, we defined a function to calculate the Shannon Index of a community. However, the Shannon Index is just one of a few diversity indices that are commonly used in ecology. This family of indices are called Hill numbers, and they are defined as follows:
$$\text{Hill number of order } q = \Big( \sum p_i^q \Big)^{1/(1-q)}$$
I will spare you from the mathematical details, but the important things to know are that the Hill number order 0 gives the total number of species in a community, and the Hill number of order 1 is just a slight variation of the Shannon Index. Higher-ordered Hill numbers are also used. Thus, we might code a function to calculate Hill numbers as follows:
hillNumber <- function (proportions, order) {
return (sum(proportions ^ order) ^ (1 / (1 - order)))
}
However, you will find that this function does not work if order
has the value of 1,
since we would be dividing by zero. However, some mathematical manipulation can be done to
avoid this issue, thus giving the following formula for the Hill number of order 1:
$$\text{Hill number of order } 1 = \exp\Big( -\sum p_i \ln (p_i)\Big)$$
So, the initial function we wrote is fine for all orders except 1. We can use conditional statements to account for this:
hillNumber <- function (proportions, order) {
if (order == 1) {
return (exp(-sum(proportions * log(proportions))))
} else {
return (sum(proportions ^ order) ^ (1 / (1 - order)))
}
}
ifelse()
function
Here, I will briefly mention a function in R that is very useful when you need to make variables
that depend on the values of other variables. For example, you may have a variable called
temperature
that contains the temperature reading for a person, and you want to make a new
variable called health.status
that states the person's health status based on their
temperature (e.g., "healthy" or "sick").
Here is how you might do that using regular conditional statements
as described above:
temperature <- 98.6
if (temperature > 100) {
has.fever <- "sick"
} else {
has.fever <- "healthy"
}
However, this can be done in a single line, which is sometimes observed in code.
There are two different ways to do this, one of them using the ifelse()
function:
temperature <- 98.6
has.fever <- if (temperature > 100) {"sick"} else {"healthy"}
# or equivalently...
has.fever <- ifelse(temperature > 100, "sick", "healthy")
As you can see, these two lines have very similar structure. Both check a condition and return
one of two values based on the condition. A normal if
statement can
be used to return values if it is put into one line. However, it looks strange given that if
statements are almost always used over multiple lines. The ifelse()
function is a
built-in function in R that takes three arguments: the condition to check, the value to return
if the condition is TRUE
, and the value to return if the condition is FALSE
.
Between the two, the ifelse()
function is more readable, and is more commonly used
in this context. It is useful when you want to make a new variable that depends on the
values of other variables, and you want to do it in a single line. However, it is not as flexible as
regular conditional statements, and after a certain level of complexity a normal multi-line if
statement should be used instead. But for simple cases, the ifelse()
function is a great tool
to be aware of.
Conditional statements are incredibly useful in programming, and will make frequent appearances in your code and the code of others. They allow you to run code only if a certain condition is met, and can be used to make your programs more customizable and useful.
One common use of conditional statements is inside user-defined functions to check the inputs provided by a user of the function. This can help make the functions you write more robust against unexpected behavior. For example, if you write a function that takes a person's age as input, you might want to check that the age is a positive integer. Of course, you might make the assumption that the user will input a positive integer, but if you were to publish your code, you'd be surprised how quickly people will find ways to break it. Therefore, it is good practice to check the input of your functions. The best way to do this is with conditional statements.
In this chapter, the contents of the if
statements were all
cat()
functions. However, you can put any code you want inside
the curly braces, including other functions, variable assignments, and more.
It is important as you learn concepts like this to think about how you might
use these tools in other contexts.
Think about the following questions and answer them with what you learned in this chapter and through your own experimentation in an R file.
if
statements versus
using an if
and else if
statement? E.g., given that car.speed = 40
,
what's the effect of using the else if
statement in the second code chunk
in place of the second if
statement in the first code chunk?
if (car.speed < 50) {
cat("You should be in the rightmost lane.")
}
if (car.speed < 60) {
cat("You can be in the second lane from the right.")
}
if (car.speed < 50) {
cat("You should be in the rightmost lane.")
} else if (car.speed < 60) {
cat("You can be in the second lane from the right.")
}
if
statement followed by several else if
statements. If the condition for the if
statement is TRUE
,
will the other conditions be checked? If the condition for the if
statement
is FALSE
, but the first else if
statement is TRUE
,
will the remaining conditions be checked?
if
-else if
block?
In R, the readline()
function allows you to get input from the user
in the console. This is useful for making interactive scripts. For example, you
can ask the user for their name, and then use that name within the script.
Below is an example of how you might use this function:
name <- readline("What's your name? ")
# prints this prompt and lets user type response in the console
cat("Hello, ", name, "!", sep = "")
# the sep argument gives what character to put between
# arguments when printing
Make a script that asks the user for their name, and then asks them if they are having a good day. Then, a final message should be printed based on their response. This message should include their name, and should be different whether they are having (1) a good day, (2) a bad day, or (3) if they said something you did not understand. Use conditional statements to accomplish this.
In many cases, users will not type in the exact response you are expecting. For example,
if you ask the user if they are having a good day, they might type yes
,
Yes
, or YES
. In order to account for this, you can use the
tolower()
function to convert the user's response to lower case within
the condition after if
, like this:
if (tolower(user.response) == "yes") {
# code to run if user says yes
}
In the practice section of the previous chapter, you wrote several functions that completed the described task. Now, do the same, but use conditional statements where necessary.
getDifference
that takes two positional arguments and a keyword argument called reversed
that defaults to FALSE
. If reversed
is FALSE
, the function
should return the difference between the first and second argument. If reversed
is TRUE
, the function
should compute the difference in the opposite order. E.g., passing
5
and 3
with reversed=TRUE
should return -2
, but
using reversed=FALSE
or omitting the reversed
argument entirely should
make the function return 2
.
printInvite
that takes a vector of names called people
. This vector could
have any length, but will contain names in a character vector, e.g., c("Fiona", "Paul", "Bess")
.
Generally, the function should
return a string that looks like this:
"Hey, _____! Come to my birthday party on Saturday!"
As long as "Stacey"
is not in people
, the function should return the string above, with the blank
filled in with the names in people
. Specifically:
people
, e.g., c("Ed")
, the function should simply insert that name, e.g., "Ed"
people
, e.g., c("Ed", "Sue")
, the function should insert
the first name, followed by " and "
, followed by the second
name, e.g., "Ed and Sue"
people
, e.g., c("Ed", "Sue", "Kim", "Li")
, the function should insert the first name, followed by ", "
, followed by the second name,
and so forth, until the final name that should be preceded by ", and "
, e.g., "Ed, Sue, Kim, and Li"
"Stacey"
is indeed in people
,
the function should return "Stacey is not invited."
regardless of the number of names in people
.
getStatistic
that takes a vector of numbers called numbers
and a keyword argument
called statistic
that defaults to "mean"
. If statistic
is "mean"
, the function
should return the mean of numbers
. If statistic
is "median"
, the function should return
the median of numbers
. If statistic
is "sd"
, the function should return the standard deviation
of the numbers. If statistic
is any other value, the function should return the string "Error: unknown statistic requested"
.