Overview

Overview

This module covers the basics of working with data. We cover the challenges and ethical issues of working with open data (as well as the benefits!), the different types of data available, and the practical experience of loading and manipulating data (with pandas) in preparation for analysis.

The module is structured into two parts:

  • Part 1: Getting and loading data

    • Open data and data sources

    • Licensing, ethics, security.

    • Pandas intro/primer

    • Data formats (CSV, database, API, image, …)

      • How to load them into Python with Pandas (mostly).

  • Part 2: Exploring and wrangling data

    • Loading a dataset for the first time (sanity checks, data parsing issues, …)

    • Manipulating different types of data (text, dates, categorical, images)

    • Feature engineering

    • Missing data

    • Privacy and anonymisation

References are given at the end of each subsection.