Exercise 1: Introduction to R and RStudio

This exercise is designed to familiarize you with the RStudio environment and basic operations in R. For a more comprehensive introduction to R, you can refer to the official R documentation at CRAN R Manuals or the tutorials available on the RStudio Education website.

The RStudio Interface

Go ahead and open RStudio. It should look something like this:

Image source: RStudio User Guide Website

The R console (bottom left) is where the R code is actually executed. You can interact with R directly by typing commands and seeing their output. However, the code you write will not be saved, so it is not recommended to write all your code here.
The source editor (top left) is where you should write and edit your R scripts and markdown documents. Files written here can be saved and returned to later. To run each line of code from the R script, select the line and press Ctrl + Enter (Windows) or Cmd + Enter (macOS). This sends the code to the R console to be executed.
The environment (top right) shows objects that are currently in your workspace (e.g., data frames, variables, functions). These can be objects that you created from scratch or imported.
The output (bottom right) displays plots generated by your R code. The Files tab allows you to browse files to navigate your project directory. The Packages tab allows you to manage R packages. The Help tab provides access to R documentation and help files. If you are unsure how to use a function or package, simply type its name into the Help tab to receive detailed usage information.

Getting started

It is important to keep all files organised for efficient data management and project workflow. To get started, create a new folder named “data-handling-in-r”. This will be our working directory for this tutorial.

As mentioned above, although you can use the R console for writing code, it is better to write code in R Scripts in the source panel. To create a new R Script, go to the first icon in the top left corner of RStudio, then select “R Script”.

Name your R Script “exercise-1.R” and save it to your working directory.

An alternative way of documenting code is by using R Markdown. R Markdown is useful for integrating code, text, and figures within a single document, making it easier to share and reproduce work.

To create a new R Markdown file, go to the same icon in the top left corner, this time selecting “R Markdown”.

You will see this dialog, allowing you to set the title, author, date, and default output format of your R Markdown document. Enter a descriptive title for your project (note that this is different from the file name). Then, click OK. Save the file to your working directory as “exercise-1.Rmd”.

Within your R Markdown document, you can use Markdown syntax for formatting text and LaTeX for mathematical expressions. R code is embedded within code chunks enclosed by triple back ticks and {r}. For example:

print("Hello world!")

[1] “Hello world!”

You can use the keyboard shortcut Ctrl + Alt + I (Windows) or Cmd + Option + I (macOS) to quickly insert an R code chunk.

Click the play button at the top right of the code chunk to run the code, or select the line of code and press Ctrl + Enter (Windows) or Cmd + Enter (macOS).

Arithmetic Operators in R

You can use R to do basic operations that you would do on a calculator. For example, open up “exercise-1.R” and type the following lines of code:

5 + 5 # addition

[1] 10

4 * 3 # multiplication

[1] 12

10 / 2 # division

[1] 5

2^5 # exponent

[1] 32

13 %% 4 # modulus

[1] 1

10 %/% 3 # integer division

[1] 3

When you click run, you’ll see the outputs of these calculations in the console.

Note: Using # allows you to write comments which are not interpreted by the R console.

Try the following examples:

7 * (4 - 2)

[1] 14

sqrt(100)

[1] 10

abs(-6)

[1] 6

abs(7)

[1] 7

Assignment Operators in R

In R, you can use objects to store information. We use the assignment operator <- to assign values to objects. Try out the following:

x <- log(2^3)

Here, we are assigning the object x the value of whatever the result of the operation \(\log(2^3)\) is.

We can see the actual value by calling the object x:

[1] 2.079442

You can then use the object to do subsequent computations, e.g.,

x*5

[1] 10.39721

If you assign a different value to the same object name, you will replace the original object and its value will be lost. So, be careful in naming your objects!

Note: R is case sensitive, so object x is not the same as object X. You will get an error if you use the wrong case:

## Error in eval(expr, envir, enclos): object 'X' not found

You can assign objects a value of any type, not just numbers. For example, you can store a string of characters by enclosing it in quotation marks:

course <- "Data Handling in R"
course

[1] “Data Handling in R”

However, you can’t mix types when performing arithmetic operations:

x + course

## Error in x + course: non-numeric argument to binary operator

You can ask R what type a certain object’s value is by using the class() function:

class(x)

[1] “numeric”

class(course)

[1] “character”

class(sqrt)

[1] “function”

Comparison Operators in R

Comparison operators are used to compare values. There are also called conditions.

Try the following examples:

5 > 2 # greater than

[1] TRUE

6 < 4 # less than

[1] FALSE

11 >= 15 # greater than or equal to

[1] FALSE

10 <= 10 # less than or equal to

[1] TRUE

2^3 == 8 # equal to

[1] TRUE

6/2 != 4 # not equal to

[1] TRUE

You can assign conditions to an object:

op <- 2^3 == 8
class(op)

[1] “logical”

As you can see, the output of these operations are all TRUE/FALSE (boolean) values. In R, these objects are of class logical.

If you try to perform arithmetic operations on logicals, TRUE becomes 1 and FALSE becomes 0.

TRUE + 10

[1] 11

FALSE - 10

[1] -10

Logical Operators in R

You can use logical operators to combine conditional statements.

For example, x & y returns TRUE if both x is TRUE and y is TRUE. If either x or y is FALSE, the operations will return FALSE. This operator & is called the element-wise logical AND operator.

In contrast, x | y returns TRUE if either x is TRUE or y is TRUE. Therefore, the operation will only return FALSE if both x and y are FALSE. This operator | is called the element-wise logical OR operator.

x <- TRUE
y <- FALSE

x & y # logical AND operator

[1] FALSE

x | y # logical OR operator

[1] TRUE

Libraries and Packages

Base R contains many useful tools for data analysis (such as those seen so far in this tutorial), but there are many additional functionalities that users might need, such as advanced data visualization, specialized statistical methods, or handling specific types of data. There are packages available in R that contain collections of functions, data, and compiled code that enhance the functionality of base R, making our life a bit easier.

Thousands of packages are available on CRAN (Comprehensive R Archive Network) and other repositories, each designed for a specific task.

In this tutorial, we will be using tidyverse, which is a collection of packages designed for data science. These include:

readr: used for reading rectangular data into R (e.g. csv, tsv and fwf)
tibble: a user-friendly way to use data frames
dplyr: provides functions for data manipulation
ggplot2: used for data visualization plots

We will be coming back to these later on. For now, we need to first install and load the tidyverse package. Write the following code in your R script:

# Install tidyverse
install.packages("tidyverse")

## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

# Load tidyverse
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

When you load tidyverse, it automatically loads the packages within it, including readr, tibble, dplyr, and ggplot2.

An alternative way to install a package is in the output (bottom right box in RStudio), go to the “Packages” tab, then click “Install” in the top left corner. In the pop up, type “tidyverse” under “Packages” (it should come up as you are typing), then click install. Continue to load the package using library(tidyverse) in the R script as above.

You can follow these steps to install other packages in the future, but we’ll just stick with tidyverse for now.

Note: you only need to install a package onto your local computer once, but you need to load the package every time you want to use it.

Conventionally, you should load all required packages at the top of the R script, before any lines of code.

Data Frames

A data frame is a list of vectors, all of the same length. Data frames in R are similar to spreadsheets, where each column can contain different types of data (numeric, character, factor, etc.), and each row represents an instance or observation.

We can create a data frame by first creating vectors then combining them.

In R, vectors are basic data structures that hold elements of the same type. We use the function c() to create vectors, where the “c” stands for combine.

Go ahead and create two vectors called “year” and “hours_sleep_per_night”, each containing a series of ten values:

year <- c(2021, 2012, 2020, 2009, 2010, 2022, 2014, 2023, 2016, 2008)
hours_sleep_per_night <- c(6.5, 8.1, 7.7, 7.9, 7.5, 6.9, 7.8, 7.4, 5.6, 7.1)

Next, we can combine these vectors into a data frame using the function data.frame():

sleep_info <- data.frame(year, hours_sleep_per_night)

df [10 x 2] means that we have created a data frame with 10 rows and 2 columns.

An alternative, more modern way of creating a data frame is to use the tibble package, which is part of tidyverse. Tibbles are a modern re-imagining of the data frame. They offer more user-friendly printing methods which makes them easier to use with large datasets containing complex objects.

Let’s go ahead and convert the speed_info data frame into a tibble using as_tibble():

sleep_info <- as_tibble(sleep_info)

Print the newly generated tibble to see it displayed:

year	hours_sleep_per_night
2021	6.5
2012	8.1
2020	7.7
2009	7.9
2010	7.5
2022	6.9
2014	7.8
2023	7.4
2016	5.6
2008	7.1

We will be using sleep_info again in exercise 3. You can also create a new tibble from column vectors with tibble():

eg_tibble <- tibble(x = 1:5, y = 1)

Print eg_tibble to see it displayed:

x	y
1	1
2	1
3	1
4	1
5	1

Here 1:5 means a sequence of numbers from 1 to 5.

Once a data frame has been created, you can add or transform its columns. This is performed using the mutate() function from the dplyr package (also part of tidyverse).

Try adding a column \(z = x^2 + y\) to eg_tibble:

eg_tibble <- eg_tibble %>%
  mutate(z = x^2 + y)

x	y	z
1	1	2
2	1	5
3	1	10
4	1	17
5	1	26

The pipe operator (%>%) takes the value on its left and passes it as the first argument to the function on its right. In this case, our data frame eg_tibble is passed as the first argument to mutate(). We will use the pipe operator more in the following exercises, as it helps make operations more readable and concise.

Exporting Data Frames to CSV

We will often have to export data frames to csv files after working with them. This is easily done in R using write.csv(). Let’s try an example and export sleep_info to a CSV file.

First, create a subdirectory in your working directory called “data”:

dir.create("data")

## Warning in dir.create("data"): 'data' already exists

Then, export the sleep_info data frame to a CSV file, saving it to the “data” subdirectory:

write.csv(sleep_info, file = "data/sleep_info.csv")

You should now be able to see “sleep_info.csv” in the “data-handling-in-r/data” subdirectory.