I am happy about your interest to learn how to code in R! Although this tutorial is focused on the R language, I hope to convey important coding concepts that can help you learn how coding works more generally. This will prepare you to adapt to the inevitable changes that occur in every coding language over time. That concept might scare you, that you will never "finish" learning all of any coding language, but in reality, a firm grasp on coding does not require proficiency in every obscure topic. Much like spoken languages, coding languages have a base set of rules and concepts that give you a working understanding of how to achieve your goals. That base set is what I hope to give to you in this tutorial.
In this chapter, I will tell you how I hope you use this tutorial and introduce some basic concepts about using code. These concepts are rather unrelated, so this chapter will feel like it's jumping around a lot, but future chapters will be more fluid.
There is a lot of reading and learning to do in this tutorial. Everyone wants the easiest way to learn a topic, but generally, learning just takes some amount of unavoidable work. I believe that hesitancy to learn new things is largely based on the fear of failure. However, I want to assure you that with some hard work, you are completely capable of learning coding.
There are a lot of code snippets in this tutorial. They will look something like this:
message <- "Hello, world!"
cat(message)
Every time you see a code snippet like this as you are reading, you should type it into the console or an R file (how to do this is explained later). I will emphasize that you should type it in by yourself, instead of copying and pasting. In fact, I have disabled copying for code snippets unless otherwise stated (please don't hate me). Typing out code is different from reading it: you catch little nuances in code when you type that you completely miss if your eyes just skim the structure. In real life coding experiences, those little nuances are pivotal for your success.
Similarly, when you get a curiosity about how something works, try it out! Coding is fun to learn in part because you can test out new concepts and get immediate feedback. There are only a few things in this world that give opportunities for learning in this way.
When you run across a piece of code that seems really complicated, one helpful approach is to read the code inside-out, just as a computer does. I will give you a made-up example:
C(B(A(1), param=TRUE))
To be clear, code like this is generally bad for readability.
However, if you happen across something like it, start with the
innermost piece of code, which in this case is A(1)
.
Once you figure out what it means and what data it gives, you
can know what the function B()
is receiving, along with
param=TRUE
, in order to determine what B()
's
value is. That helps you know what C()
is receiving.
Although I do not expect you to know all the terms I used in discussing
this example (all of which are explained in later chapters), the main
concept is to break down complex code into pieces that you can understand,
which often requires this "inside-out" approach.
Generally, when people start coding, they approach coding projects like I approached French class in high school: with blind repetition and without a drive to understand underlying concepts. However, I hope to introduce coding topics in a way that allows you to understand what you are doing every step of the way, so that you can build on what you already know.
When starting a coding project, it is important to begin with a plan. Such a plan should have detailed steps that break down the overall goal into manageable steps. When you write out such a plan, it is called pseudocode. For example, if your end goal is to print all the even numbers out of a long list, the pseudocode for the project might look like this:
import the data, and save it to a variable called "long.data.vector"
for each number in long.data.vector:
check if the number is divisible by 2. If it is:
print the number
Although you might not immediately know how to convert every line in your pseudocode into real code, you have now digested the more complex problem into bite-sized pieces. Then, you can do research for the lines you do not know how to code. Typing or writing out pseudocode is a terrific start to any coding project.
When it comes time to write code, always remember to do so neatly and readably. A working line of code can be done many ways, as seen here:
a=sort(c(4,2,3,6),decreasing=T)
SV<- sort (c(4, 2, 3,6), decreasing=TRUE ) #SV = Sorted Vector
sorted.vector <- sort(c(4, 2, 3, 6), decreasing=TRUE)
All the above lines are successful in sorting the vector
c(4, 2, 3, 6)
in descending order, and saving it
to a variable. However, some are more readable than others.
Here are some notes about the different coding conventions,
or the lack thereof, shown here.
a
mean? No one, including you, will understand what you mean.
Alternatively, you might remember initially that
SV
stands for "sorted vector", but if you have
fifty variables, all of them acronyms, you will be confused
the next time you look at your code. Someone else reading your
code will pull out their hair trying to understand what your
acronym variables mean. Just use non-abbreviated, clear
variable names!
sort
almost looks like a
variable name instead of a function name, and it is very unclear where
things start and end. The last line is best: it has consistent
spacing, and it is clear when things start and end.
When a function only has two arguments in it like the code above, a single line of code might be fine for readability. However, if it has many more than that, you really should break up the line into a few lines, and use tabs to clearly show that the arguments are part of the function. Although it is not necessary for the code above, here is how you could apply that style to this same line of code:
sorted.vector <- sort(
c(4, 2, 3, 6),
decreasing = TRUE
)
You might hate the solo right parenthesis that is hanging
out on the fourth line above. Honestly, I felt that way too
when I first started coding. However, it is a terrific way to
clearly visually indicate the end of a function. From the name
sorted.vector
, you can just look straight down and
see exactly where the function ends.
Similarly, if you have long equations to type in, you might
want to break up the line by making a new line
after the operators (e.g., +
, *
, etc.), and adding a tab.
That way, you can see the whole line of code on your screen at
once. Here is how that might look (note that there is
consistent spacing around operators):
long.calculation <- variable1 * variable2 /
(variable3 + variable4 - variable5) +
variable6 * variable7 * variable8
In this tutorial, I will introduce you to good conventions for R coding as their relevant topics are discussed. However, I will provide a list of some conventions I recommend. If you disagree with some of them, I will not lose sleep. Whatever rules you choose to follow, please, be consistent. I beg you.
<-
, as opposed to =
.
Note that =
is required to set the value of a parameter in a function.
some.variable.name
. Alternatively,
you can use this pattern: some_variable_name
, but pick one and stick to it!
The latter style is called snake case, and is much more popular across other
coding languages, as R is unique in allowing periods in their variable names.
someFunctionName
. This is called lower
camel case, referring to the case that all the first letters are capitalized except the first one.
+
, -
, etc.)
Unless you have made a concerted effort to learn keyboard shortcuts (i.e., pressing a few keys to perform certain functions), you likely do not use them frequently. However, you may be surprised to learn that using the mouse is at minimum ten times slower than using keyboard shortcuts. Although a single instance of moving a mouse to a button and clicking is negligible, that time adds up fast when you need to do that hundreds of times to edit code. The downside to keyboard shortcuts is that you have to learn and remember them. That might not seem all that attractive of a trade-off, but trust me when I say that you will be glad to learn even a few keyboard shortcuts. I want to introduce you to a few keyboard shortcuts are particularly time-saving. I hope you pick a two or three to learn and try to use them while coding.
Action |
Windows Shortcut |
MacOS Shortcut |
---|---|---|
Save File |
Ctrl + S
|
Cmd + S
|
Select All Text |
Ctrl + A
|
Cmd + A
|
Undo Last Action |
Ctrl + Z
|
Cmd + Z
|
Cut Selected Text |
Ctrl + X
|
Cmd + X
|
Copy Selected Text |
Ctrl + C
|
Cmd + C
|
Paste Text from Clipboard |
Ctrl + V
|
Cmd + V
|
Action |
Windows Shortcut |
MacOS Shortcut |
---|---|---|
Move Cursor to Beginning of Line |
Fn + LeftArrowKey or Home
|
Cmd + LeftArrowKey
|
Move Cursor to End of Line |
Fn + RightArrowKey or End
|
Cmd + RightArrowKey
|
*Select Character to the Left |
Shift + LeftArrowKey
|
Shift + LeftArrowKey
|
*Select Character to the Right |
Shift + RightArrowKey
|
Shift + RightArrowKey
|
Select All Text to the Left of the Cursor |
Fn + Shift + LeftArrowKey or Shift + Home
|
Cmd + Shift + LeftArrowKey
|
Select All Text to the Right of the Cursor |
Fn + Shift + RightArrowKey or Shift + End
|
Cmd + Shift + RightArrowKey
|
* For these shortcuts, you can hold shift and keep pressing the arrow key to select several characters.
These two shortcuts work with several text editors, including RStudio.
Action |
Windows Shortcut |
MacOS Shortcut |
---|---|---|
Move Current Line Up One Line |
Alt + UpArrowKey
|
Option + UpArrowKey
|
Move Current Line Down One Line |
Alt + DownArrowKey
|
Option + DownArrowKey
|
Surf Thru Options in Suggestion List |
Tab |
Tab |
There are many online resources that can help you learn coding. Remember that learning coding is difficult, so you will often need to put in some work to understand what you find online. However, these resources can be very helpful. No matter which you use, try to understand what you find on the internet instead of blindly copying it . Here are some of my favorite resources:
I want to make you aware that R has several oddities that other languages
simply do not have. I will talk about many of them throughout this tutorial,
but there are a few that make learning coding with R particularly difficult.
First, R is too lenient, and does not give errors often enough. Very
changes in code can make the program still run, but not do what you want it to do.
This makes it difficult to know where the initial error is. Second, when R does give
errors about code, the message is nearly always completely unhelpful in helping
you identify the error. Finally, R is very inconsistent and unhelpful in how it names things.
It does not follow any standard conventions for naming functions whatsoever.
In addition, functions have insensible names, and totally different functions
have only one-character differences (e.g., line()
and lines()
,
where the first is a statistical analysis function and the second is for graphing).
In all reality, many data scientists are moving on from R in favor of
languages like Julia
and especially Python
.
However, some scientific circles have held on to R, continuing
to make more libraries of functions that have made and will make significant
scientific
advancements. Thus, learning R is still a valuable use of your time, even
if it is slow and has notable peculiarities.
You should feel comforted by the fact that once you learn one coding language well, you are significantly more capable of learning other coding languages. Even if your field starts to migrate to a different language part-way through your career, you will not need to put in nearly as much effort into learning the new language.
Remember that coding is an incredible tool, and is worth your attention. It is empowering and will help you in scientific investigation. I have made this tutorial in hopes to provide you with a resource to help you learn coding with R from scratch, which is not easy. However, with diligent efforts, I believe everyone can get a grasp on these topics. Have fun and explore your curiosities!