00| Introduction to R Tutorial

Miles Robertson, 12.27.23 (edited 01.10.24)

Introduction

I am happy about your interest to learn how to code in R! Although this tutorial is focused on the R language, I hope to convey important coding concepts that can help you learn how coding works more generally. This will prepare you to adapt to the inevitable changes that occur in every coding language over time. That concept might scare you, that you will never "finish" learning all of any coding language, but in reality, a firm grasp on coding does not require proficiency in every obscure topic. Much like spoken languages, coding languages have a base set of rules and concepts that give you a working understanding of how to achieve your goals. That base set is what I hope to give to you in this tutorial.

In this chapter, I will tell you how I hope you use this tutorial and introduce some basic concepts about using code. These concepts are rather unrelated, so this chapter will feel like it's jumping around a lot, but future chapters will be more fluid.

How to Use This Tutorial

There is a lot of reading and learning to do in this tutorial. Everyone wants the easiest way to learn a topic, but generally, learning just takes some amount of unavoidable work. I believe that hesitancy to learn new things is largely based on the fear of failure. However, I want to assure you that with some hard work, you are completely capable of learning coding.

There are a lot of code snippets in this tutorial. They will look something like this:

message <- "Hello, world!"
cat(message)

Every time you see a code snippet like this as you are reading, you should type it into the console or an R file (how to do this is explained later). I will emphasize that you should type it in by yourself, instead of copying and pasting. In fact, I have disabled copying for code snippets unless otherwise stated (please don't hate me). Typing out code is different from reading it: you catch little nuances in code when you type that you completely miss if your eyes just skim the structure. In real life coding experiences, those little nuances are pivotal for your success.

Similarly, when you get a curiosity about how something works, try it out! Coding is fun to learn in part because you can test out new concepts and get immediate feedback. There are only a few things in this world that give opportunities for learning in this way.

When you run across a piece of code that seems really complicated, one helpful approach is to read the code inside-out, just as a computer does. I will give you a made-up example:

C(B(A(1), param=TRUE))

To be clear, code like this is generally bad for readability. However, if you happen across something like it, start with the innermost piece of code, which in this case is A(1). Once you figure out what it means and what data it gives, you can know what the function B() is receiving, along with param=TRUE, in order to determine what B()'s value is. That helps you know what C() is receiving. Although I do not expect you to know all the terms I used in discussing this example (all of which are explained in later chapters), the main concept is to break down complex code into pieces that you can understand, which often requires this "inside-out" approach.

Workflow and Conventions

Generally, when people start coding, they approach coding projects like I approached French class in high school: with blind repetition and without a drive to understand underlying concepts. However, I hope to introduce coding topics in a way that allows you to understand what you are doing every step of the way, so that you can build on what you already know.

When starting a coding project, it is important to begin with a plan. Such a plan should have detailed steps that break down the overall goal into manageable steps. When you write out such a plan, it is called pseudocode. For example, if your end goal is to print all the even numbers out of a long list, the pseudocode for the project might look like this:

import the data, and save it to a variable called "long.data.vector"

for each number in long.data.vector:
    check if the number is divisible by 2. If it is:
        print the number

Although you might not immediately know how to convert every line in your pseudocode into real code, you have now digested the more complex problem into bite-sized pieces. Then, you can do research for the lines you do not know how to code. Typing or writing out pseudocode is a terrific start to any coding project.

When it comes time to write code, always remember to do so neatly and readably. A working line of code can be done many ways, as seen here:

a=sort(c(4,2,3,6),decreasing=T)
SV<- sort (c(4, 2,  3,6), decreasing=TRUE ) #SV = Sorted Vector
sorted.vector <- sort(c(4, 2, 3, 6), decreasing=TRUE)

All the above lines are successful in sorting the vector c(4, 2, 3, 6) in descending order, and saving it to a variable. However, some are more readable than others. Here are some notes about the different coding conventions, or the lack thereof, shown here.

Start with variable names: what the heck does a mean? No one, including you, will understand what you mean. Alternatively, you might remember initially that SV stands for "sorted vector", but if you have fifty variables, all of them acronyms, you will be confused the next time you look at your code. Someone else reading your code will pull out their hair trying to understand what your acronym variables mean. Just use non-abbreviated, clear variable names!
How about the use of spaces? The first line almost looks like gibberish. The second is so bad that sort almost looks like a variable name instead of a function name, and it is very unclear where things start and end. The last line is best: it has consistent spacing, and it is clear when things start and end.

When a function only has two arguments in it like the code above, a single line of code might be fine for readability. However, if it has many more than that, you really should break up the line into a few lines, and use tabs to clearly show that the arguments are part of the function. Although it is not necessary for the code above, here is how you could apply that style to this same line of code:

sorted.vector <- sort(
    c(4, 2, 3, 6), 
    decreasing = TRUE
)

You might hate the solo right parenthesis that is hanging out on the fourth line above. Honestly, I felt that way too when I first started coding. However, it is a terrific way to clearly visually indicate the end of a function. From the name sorted.vector, you can just look straight down and see exactly where the function ends.

Similarly, if you have long equations to type in, you might want to break up the line by making a new line after the operators (e.g., +, *, etc.), and adding a tab. That way, you can see the whole line of code on your screen at once. Here is how that might look (note that there is consistent spacing around operators):

long.calculation <- variable1 * variable2 /
    (variable3 + variable4 - variable5) +
    variable6 * variable7 * variable8

Recommended Coding Style Guide

In this tutorial, I will introduce you to good conventions for R coding as their relevant topics are discussed. However, I will provide a list of some conventions I recommend. If you disagree with some of them, I will not lose sleep. Whatever rules you choose to follow, please, be consistent. I beg you.

To assign variables, use <-, as opposed to =. Note that = is required to set the value of a parameter in a function.
Variable and function names should not include abbreviated words, and must be descriptive of their contents, or of what they do in the case of functions. It is okay if the names are long, but you can usually figure out a good name that uses one to four words.
Name variables using this pattern: some.variable.name. Alternatively, you can use this pattern: some_variable_name, but pick one and stick to it! The latter style is called snake case, and is much more popular across other coding languages, as R is unique in allowing periods in their variable names.
Name functions using this pattern: someFunctionName. This is called lower camel case, referring to the case that all the first letters are capitalized except the first one.
Break up long lines into multiple lines, as described above. Use a tab to show one line is continued on the following one.
Place spaces after commas and on either side of operators (e.g., +, -, etc.)

Keyboard Shortcuts

Unless you have made a concerted effort to learn keyboard shortcuts (i.e., pressing a few keys to perform certain functions), you likely do not use them frequently. However, you may be surprised to learn that using the mouse is at minimum ten times slower than using keyboard shortcuts. Although a single instance of moving a mouse to a button and clicking is negligible, that time adds up fast when you need to do that hundreds of times to edit code. The downside to keyboard shortcuts is that you have to learn and remember them. That might not seem all that attractive of a trade-off, but trust me when I say that you will be glad to learn even a few keyboard shortcuts. I want to introduce you to a few keyboard shortcuts are particularly time-saving. I hope you pick a two or three to learn and try to use them while coding.

The Basics

Action	Windows Shortcut	MacOS Shortcut
Save File	`Ctrl + S`	`Cmd + S`
Select All Text	`Ctrl + A`	`Cmd + A`
Undo Last Action	`Ctrl + Z`	`Cmd + Z`
Cut Selected Text	`Ctrl + X`	`Cmd + X`
Copy Selected Text	`Ctrl + C`	`Cmd + C`
Paste Text from Clipboard	`Ctrl + V`	`Cmd + V`

Move Cursor, Select Text

Action	Windows Shortcut	MacOS Shortcut
Move Cursor to Beginning of Line	`Fn + LeftArrowKey` or `Home`	`Cmd + LeftArrowKey`
Move Cursor to End of Line	`Fn + RightArrowKey` or `End`	`Cmd + RightArrowKey`
*Select Character to the Left	`Shift + LeftArrowKey`	`Shift + LeftArrowKey`
*Select Character to the Right	`Shift + RightArrowKey`	`Shift + RightArrowKey`
Select All Text to the Left of the Cursor	`Fn + Shift + LeftArrowKey` or `Shift + Home`	`Cmd + Shift + LeftArrowKey`
Select All Text to the Right of the Cursor	`Fn + Shift + RightArrowKey` or `Shift + End`	`Cmd + Shift + RightArrowKey`

* For these shortcuts, you can hold shift and keep pressing the arrow key to select several characters.

Text Editing

These two shortcuts work with several text editors, including RStudio.

Action	Windows Shortcut	MacOS Shortcut
Move Current Line Up One Line	`Alt + UpArrowKey`	`Option + UpArrowKey`
Move Current Line Down One Line	`Alt + DownArrowKey`	`Option + DownArrowKey`
Surf Thru Options in Suggestion List	`Tab`	`Tab`

Online Resources

There are many online resources that can help you learn coding. Remember that learning coding is difficult, so you will often need to put in some work to understand what you find online. However, these resources can be very helpful. No matter which you use, try to understand what you find on the internet instead of blindly copying it . Here are some of my favorite resources:

Search Engines: You can always just google your question to find what others have to say about it. This can be particularly useful when you get an error message that you do not understand. More broadly, you can search things like "how to add row to data frame R" and you will find many resources that can help you.
Stack Overflow: This is a forum where people ask questions about coding, and other people answer them. People have asked all sorts of questions about R on there. Remember to read several answers to a question, as some answers are better than others. You can ask your own questions as well on the site, but moderation is strict and somewhat arbitrary, so prepare for some degree of frustration.
ChatGPT, Gemini, etc: Most modern AI chatbots are very good at answering questions about coding. You can ask them questions about R, and they will often give you a good answer, or at least push you in the right direction. However, the more complex the topic, the less likely they are to give you a good (or correct) answer.

Some Warnings about R

I want to make you aware that R has several oddities that other languages simply do not have. I will talk about many of them throughout this tutorial, but there are a few that make learning coding with R particularly difficult. First, R is too lenient, and does not give errors often enough. Very changes in code can make the program still run, but not do what you want it to do. This makes it difficult to know where the initial error is. Second, when R does give errors about code, the message is nearly always completely unhelpful in helping you identify the error. Finally, R is very inconsistent and unhelpful in how it names things. It does not follow any standard conventions for naming functions whatsoever. In addition, functions have insensible names, and totally different functions have only one-character differences (e.g., line() and lines(), where the first is a statistical analysis function and the second is for graphing).

In all reality, many data scientists are moving on from R in favor of languages like Julia and especially Python. However, some scientific circles have held on to R, continuing to make more libraries of functions that have made and will make significant scientific advancements. Thus, learning R is still a valuable use of your time, even if it is slow and has notable peculiarities.

You should feel comforted by the fact that once you learn one coding language well, you are significantly more capable of learning other coding languages. Even if your field starts to migrate to a different language part-way through your career, you will not need to put in nearly as much effort into learning the new language.

Conclusion

Remember that coding is an incredible tool, and is worth your attention. It is empowering and will help you in scientific investigation. I have made this tutorial in hopes to provide you with a resource to help you learn coding with R from scratch, which is not easy. However, with diligent efforts, I believe everyone can get a grasp on these topics. Have fun and explore your curiosities!