There's a little setup required in order to get R up and running on your computer. In addition, there are some other programs that are very important for collaborating on code with other people. This tutorial will walk you through the process of setting up R and RStudio, as well as Git, and talk about the role that each of these programs play.
R is a programming language with focus on statistical applications. Generally speaking, a programming language is a defined syntax that, if followed, can then be turned into machine code. In other words, the "R language" is a set of notation rules for us as coders to follow, and if we follow them, our typed-out code is turned into ones and zeros (i.e., machine code) that the computer can execute. R has been around since 1993 (compare to: C in 1972, Python in 1991, Java in 1995, Julia in 2012), designed by Ross Ihaka and Robert Gentleman, with heavy inspiration from the S programming language (started 1975). As of October 2023, R is ranked as the 17th most popular coding languages by the TIOBE index, with Python, C and C++ leading the pack.
It is a popular language because it is free, open-source (which means that anyone can legally change the "background" code that makes R work), and is well-established amongst many data scientists. It has myriad packages relating to all realms of biology, which have been developed by academics throughout the world. Despite its positive qualities, R has some shortcomings that make it a difficult first language to learn. For example, R has inconsistent naming conventions, which means there are several different patterns used for naming variables and functions. That makes variable and function names hard to remember and distinguish. Contrary to common belief, the messy nature of R is not because it is an open-sourced language (e.g., Python is an open-source language and has very consistent naming conventions), but instead because R conventions are poorly defined and enforced. Additionally, those who design R packages often try to make sure that complete statistical analyses can be computed with just a few lines of code. This is good in that you'll almost always get the output you need, but it might be buried in a lot of results that you don't need. However, the fact that R is so common in data science means that there are many resources available to help overcome problems.
"Downloading R" has the effect of telling your computer how to interpret and run .R
files.
This is different from the purpose of RStudio, which is discussed in the next section. In the box below,
instructions are provided on how to get R onto your computer.
base
link, and then click the top link that looks like
Download R-#.#.# for Windows
, whatever the version number may be.RStudio is distinct from R in that it is an integrated development environment (IDE). An IDE is a graphical program that allows you to write code, run code, and see the output of your code all in one place. RStudio is a popular IDE for R, and it is free to download and use. It is not necessary to use RStudio to write R code, but it is used by most R users. In fact, RStudio has recently pivoted to also support editing of Python code.
Instructions to install RStudio are provided in the box below.
Git is a version control system (VCS). A VCS is a program that allows you to track changes to files. It is most commonly used for code, but can be used for any type of file. Git is a popular VCS, and it is free to download and use. It is not necessary to use Git to write code, but it is used by most programmers. Git is a command-line program, which means that it is run from the terminal (more info on that later). Git is a powerful tool that allows you to track changes to files, revert to previous versions of files, and collaborate with others on code. It is important to note that Git is not the same as GitHub, which will be discussed later.
Luckily for Mac users, Git comes pre-installed and no setup is required. Windows users will have to follow the instructions in the box below to install Git.
Now we've gone over the purpose of R, RStudio, and Git. Below, you'll be able to practice using R and RStudio. Using Git and the terminal will be handled in future sections.
When you open RStudio for the first time, you'll see three windows. The left window is the console, which is where you can type code and see the output of that code. When you create or open a file, this left side is then broken into two windows, one for the file and one for the console. The top right window is the environment, which is where you can see all of the variables that are currently defined. The bottom right window is used for many purposes, but is frequently used to look at help documents, plot outputs and files.
There's a few settings to be aware of. Go to Tools > Global Options
in the top ribbon.
In the window that pops up,
click on the Code
tab on the left. It should automatically have the Editing
tab open. Change Tab width:
from 2 to 4. This is just saying how many spaces are
put into your code when you hit tab. The default of 2 is far too small for code readability. Then,
click on the Display
tab on the top. Make sure that
the Highlight R function calls
and Use rainbow parentheses
boxes are checked.
Both options will help you see what is happening in your code more clearly. Click Apply
to save your changes.
If desired, you can also change the theme of RStudio. This can be done by clicking Appearance
in the same Global Options page, and then selecting a theme from the dropdown menu. Click Apply
to save your changes.
As stated above, the console window in RStudio can run R code. However, it does not save the code that you type.
In order to save code, you need to create an R script. To do this, click File > New File > R Script
in the top ribbon. This will create a new, unsaved, untitled file, which opens above the console in a new fourth
window. In this window, type the following code:
print("Hello world!")
Place your cursor on that line of code, and press Ctrl + Enter
(or Cmd + Return
for Mac), and you'll see your line in the console window,
as well as the message "Hello world!"
. Now, press Ctrl + S
(or Cmd + S
for Mac) to save your new file under a name and folder of your choice.
Return to the previous chapter and try out all 14 keyboard shortcuts introduced there. Pick two or three to practice until you have them memorized.