Data Handling in Python: Introduction¶

Welcome to this tutorial on Data Handling in Python! In this tutorial, you will learn the basics of managing and handling data in Python through a series of four exercises:

  1. Basic Operations in Python
  2. Data Cleaning and Preprocessing
  3. Data Transformation
  4. Data Visualization and Plotting

Each exercise is provided as a separate document. By the end of this tutorial, you should be confident in using Python for data handling, performing transformations, creating visualizations, and producing reproducible analyses with Jupyter Notebook or other Python IDEs.

There is some prerequisite knowledge that you should know before starting this tutorial, including:

  • Basic understanding of Python programming: variables, data types, functions, loops, and conditionals
  • Basic knowledge of data science

Installing Python¶

Before we get started on the exercises, we need to ensure that everything is installed.

Check if you already have Python installed:

  1. In the terminal, type python --version.
  2. If an error comes up, follow the steps below to install Python.

To install Python:

  1. Go to the Python download website: https://www.python.org/downloads/
  2. Click the yellow button saying "Download Python ..."
  3. Once the download is complete, open the installer file.
  4. Click Continue until the installation is complete.

It is important to regularly update Python to ensure that you have the latest versions installed and avoid errors.

Python IDE¶

I strongly recommend using Jupyter Notebook to complete this tutorial. Jupyter Notebook is a free software that combines Python code with text, making it ideal for writing reports and conducting data analysis. It allows you to write, execute, and visualise code within the same document. This makes it an ideal platform for following along with the tutorial, as you can experiment with the code and see the results of your exercises.

Getting Started with Jupyter Notebook¶

1. Install Jupyter Notebook

You can install Jupyter Notebook using pip, the Python package installer. Open your terminal or command prompt and run the following command:

In [ ]:
pip install notebook

2. Launch Jupyter Notebook

After installation is complete, navigate to your project directory and launch Jupyter Notebook by running:

In [ ]:
jupyter notebook

This will take you to the Jupyter dashboard on your browser.

3. Create a New Notebook

In the Jupyter dashboard, click on the “New” button on the right-hand side and select "Python 3" from the dropdown menu. This will open a new notebook where you can start writing and running Python code.

new-notebook.png

4. Using Code Blocks

In a Jupyter Notebook, you can create and execute code cells to run Python code. To create a code cell, click the plus button from the cell toolbar in the top left of the notebook.

new-cell.png

In order to execute your code in the cell, click on the cell, and click "Run" in the cell toolbar, or press Shift + Enter.

run-cell.png

5. Using Markdown for Text

Markdown cells allow you to add formatted text, images, and links to your notebook. To create a markdown cell, click on the cell, and from the cell toolbar, select "Markdown" from the dropdown menu.

markdown-cell.png

You can then type your markdown text and press Shift + Enter or "Run" to render it.

Learn more about using Jupyter Notebook on their official documentation site here: https://jupyter-notebook.readthedocs.io/en/stable/

In this tutorial, we will use a modified version of the Scottish Health Survey (SHeS) dataset which we provide on the MANTRA website. SHeS is an open-source dataset by the Scottish government that provides indicators of population health and related risk factors. More details are available on the Scottish Government website.

Scottish Government. (2022). Scottish Health Survey-Scotland level data: Indicators of population health and related risk factors from the Scottish Health Survey (2008-2022) [Data set]. statistics.gov.scot. https://statistics.gov.scot/data/scottish-health-survey-scotland-level-data