Welcome to this tutorial on Data Handling in R! In this tutorial, you will learn the basics of managing and handling data in R through a series of four exercises:

  1. RStudio Navigation and Project Setup
  2. Data Cleaning and Preprocessing
  3. Data Transformation
  4. Data Visualization and Plotting

Each exercise is provided as a separate document. By the end of this tutorial, you should be confident in using the RStudio interface, handling data, performing transformations, creating visualizations, and producing reproducible reports with R Markdown.

There is some prerequisite knowledge that you should know before starting this tutorial, including: - Basic programming concepts: loops, conditionals and functions. - Basic knowledge of data structures in R: data frames, vectors. - Basic knowledge of data types: numeric, integer, logical/boolean, character/string, list. - Basic knowledge of descriptive statistics: mean, median, standard deviation, variance.

Installing R and RStudio

Before we get started on the exercises, we need to ensure that everything is installed.

To install R:

  1. Download the R installer
  1. For Windows here
  2. For Mac here
  1. Open the installer and follow the steps.
  2. Voila! You have now installed R on your local computer.

To install RStudio:

  1. Verify that you have installed R and that you can launch the R application
  2. Download the RStudio Desktop installer here
  3. Open the installer and follow the steps.
  4. Voila! You have now installed RStudio on your local computer.

It is important to regularly update R and RStudio to ensure that you have the latest versions installed and avoid errors.

R vs RStudio

You might be wondering what the difference is between R and RStudio.

R is a programming language and software environment widely used for statistical analysis, data manipulation, and visualization.

RStudio is an Integrated Development Environment (IDE) for R, containing a set of tools built to make R easier to use and provide additional functionality. RStudio combines a source code editor, build automation tools, and a debugger. RStudio is what we’ll be using to write our code.

Dataset

In this tutorial, we will use a modified version of the Scottish Health Survey (SHeS) dataset which we provide on the MANTRA website. SHeS is an open-source dataset by the Scottish government that provides indicators of population health and related risk factors. More details are available on the Scottish Government website.

Scottish Government. (2022). Scottish Health Survey-Scotland level data: Indicators of population health and related risk factors from the Scottish Health Survey (2008-2022) [Data set]. statistics.gov.scot. https://statistics.gov.scot/data/scottish-health-survey-scotland-level-data