{ "cells": [ { "cell_type": "markdown", "id": "61534958", "metadata": {}, "source": [ "# 10.5 Larger datasets - beyond pandas and CSV" ] }, { "cell_type": "markdown", "id": "6e4ba9e2", "metadata": {}, "source": [ "*Estimated time for this notebook: 20 minutes.*" ] }, { "cell_type": "markdown", "id": "f0bc6f4c", "metadata": {}, "source": [ "Much of the data that we deal with can be represented in tabular form, and can be handled in data structures such as the *pandas DataFrame*.\n", "We have already (briefly) seen how we can read and write csv files from pandas, and there are also methods for reading the results of SQL queries into _pandas DataFrames_.\n", "\n", "However, if we have very large datasets (millions of rows), or cases where we need fast and intensive processing on these tables, _pandas_ may not be the best choice." ] }, { "cell_type": "markdown", "id": "2d194647", "metadata": {}, "source": [ "## Row-wise vs column-wise\n", "\n", "Let's read a csv file containing international men's football results into a _pandas DataFrame_:" ] }, { "cell_type": "code", "execution_count": 1, "id": "6d7e016b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "date | \n", "home_team | \n", "away_team | \n", "home_score | \n", "away_score | \n", "tournament | \n", "city | \n", "country | \n", "neutral | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "1872-11-30 | \n", "Scotland | \n", "England | \n", "0 | \n", "0 | \n", "Friendly | \n", "Glasgow | \n", "Scotland | \n", "False | \n", "
1 | \n", "1 | \n", "1873-03-08 | \n", "England | \n", "Scotland | \n", "4 | \n", "2 | \n", "Friendly | \n", "London | \n", "England | \n", "False | \n", "
2 | \n", "2 | \n", "1874-03-07 | \n", "Scotland | \n", "England | \n", "2 | \n", "1 | \n", "Friendly | \n", "Glasgow | \n", "Scotland | \n", "False | \n", "
3 | \n", "3 | \n", "1875-03-06 | \n", "England | \n", "Scotland | \n", "2 | \n", "2 | \n", "Friendly | \n", "London | \n", "England | \n", "False | \n", "
4 | \n", "4 | \n", "1876-03-04 | \n", "Scotland | \n", "England | \n", "3 | \n", "0 | \n", "Friendly | \n", "Glasgow | \n", "Scotland | \n", "False | \n", "