Overview
Overview¶
This module covers the basics of working with data. We cover the challenges and ethical issues of working with open data (as well as the benefits!), the different types of data available, and the practical experience of loading and manipulating data (with pandas
) in preparation for analysis.
The module is structured into two parts:
Part 1: Getting and loading data
Open data and data sources
Licensing, ethics, security.
Pandas intro/primer
Data formats (CSV, database, API, image, …)
How to load them into Python with Pandas (mostly).
Part 2: Exploring and wrangling data
Loading a dataset for the first time (sanity checks, data parsing issues, …)
Manipulating different types of data (text, dates, categorical, images)
Feature engineering
Missing data
Privacy and anonymisation
References are given at the end of each subsection.