{
"cells": [
{
"cell_type": "markdown",
"id": "36098caa-a551-4580-95c9-0082403965bb",
"metadata": {},
"source": [
"# 2.2.2 Modifying Columns and Indices\n",
"\n",
"## Dropping Named Columns and Indices\n",
"\n",
"We may wish to drop columns or indices entirely from a Pandas DataFrame.\n",
"\n",
"This may be where we’ve already done some analysis and know the labels\n",
"of the column(s) to drop. We use the\n",
"[`DataFrame.drop`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html)\n",
"method:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "92ed98ca",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"original dataframe:\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_a | \n",
" col_b | \n",
" col_c | \n",
" col_d | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1.0 | \n",
" aaa | \n",
" AAA | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 2.0 | \n",
" bbb | \n",
" BBB | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" ccc | \n",
" CCC | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 4.0 | \n",
" ddd | \n",
" None | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_a col_b col_c col_d\n",
"0 1.0 aaa AAA NaN\n",
"1 2.0 bbb BBB NaN\n",
"2 NaN ccc CCC NaN\n",
"3 4.0 ddd None NaN"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"drop ‘col_b’:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_a | \n",
" col_c | \n",
" col_d | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1.0 | \n",
" AAA | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 2.0 | \n",
" BBB | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" CCC | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 4.0 | \n",
" None | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_a col_c col_d\n",
"0 1.0 AAA NaN\n",
"1 2.0 BBB NaN\n",
"2 NaN CCC NaN\n",
"3 4.0 None NaN"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from IPython.display import display\n",
"\n",
"import pandas as pd \n",
"import numpy as np\n",
"\n",
"# construct df with null values\n",
"\n",
"df = pd.DataFrame.from_dict({ \"col_a\": [1, 2, np.nan, 4], \"col_b\": [\"aaa\", \"bbb\", \"ccc\", \"ddd\"], \"col_c\": [\"AAA\", \"BBB\", \"CCC\", None],\"col_d\": [np.nan, np.nan, np.nan, np.nan], })\n",
"\n",
"# show the dataframe\n",
"print(\"original dataframe:\") \n",
"display(df)\n",
"\n",
"print(\"drop ‘col_b’:\") \n",
"display(df.drop(columns=[\"col_b\"]))"
]
},
{
"cell_type": "markdown",
"id": "af44c1ff",
"metadata": {},
"source": [
"We can also use this method to drop indices:\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7ee0bcd9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"drop rows 1 and 3\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_a | \n",
" col_b | \n",
" col_c | \n",
" col_d | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1.0 | \n",
" aaa | \n",
" AAA | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" ccc | \n",
" CCC | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_a col_b col_c col_d\n",
"0 1.0 aaa AAA NaN\n",
"2 NaN ccc CCC NaN"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# show the dataframe\n",
"print(\"drop rows 1 and 3\")\n",
"display(df.drop(index=[1,3]))"
]
},
{
"cell_type": "markdown",
"id": "ebc50d4b",
"metadata": {},
"source": [
"\n",
"## Dropping Columns and Indices with Nulls\n",
"\n",
"It may be that we wish to drop columns according to whether they have\n",
"missing data. Here, we use the\n",
"[`DataFrame.dropna`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)\n",
"method:\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3e3a8bd2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"original dataframe recap:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_a | \n",
" col_b | \n",
" col_c | \n",
" col_d | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1.0 | \n",
" aaa | \n",
" AAA | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 2.0 | \n",
" bbb | \n",
" BBB | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" ccc | \n",
" CCC | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 4.0 | \n",
" ddd | \n",
" None | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_a col_b col_c col_d\n",
"0 1.0 aaa AAA NaN\n",
"1 2.0 bbb BBB NaN\n",
"2 NaN ccc CCC NaN\n",
"3 4.0 ddd None NaN"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"drop columns where *all* values are NA:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_a | \n",
" col_b | \n",
" col_c | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1.0 | \n",
" aaa | \n",
" AAA | \n",
"
\n",
" \n",
" 1 | \n",
" 2.0 | \n",
" bbb | \n",
" BBB | \n",
"
\n",
" \n",
" 2 | \n",
" NaN | \n",
" ccc | \n",
" CCC | \n",
"
\n",
" \n",
" 3 | \n",
" 4.0 | \n",
" ddd | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_a col_b col_c\n",
"0 1.0 aaa AAA\n",
"1 2.0 bbb BBB\n",
"2 NaN ccc CCC\n",
"3 4.0 ddd None"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"drop columns where *any* values are NA:\n",
"note that None is treated like NaN.\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_b | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" aaa | \n",
"
\n",
" \n",
" 1 | \n",
" bbb | \n",
"
\n",
" \n",
" 2 | \n",
" ccc | \n",
"
\n",
" \n",
" 3 | \n",
" ddd | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_b\n",
"0 aaa\n",
"1 bbb\n",
"2 ccc\n",
"3 ddd"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# show the dataframe \n",
"print(\"original dataframe recap:\")\n",
"display(df)\n",
"\n",
"print(\"drop columns where *all* values are NA:\")\n",
"display(df.dropna(axis=\"columns\", how=\"all\"))\n",
"\n",
"print(\"drop columns where *any* values are NA:\")\n",
"print(\"note that None is treated like NaN.\") \n",
"display(df.dropna(axis=\"columns\", how=\"any\"))"
]
},
{
"cell_type": "markdown",
"id": "668a2fcc",
"metadata": {},
"source": [
"\n",
"Once again, we can use this method to drop indices:\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "03c20977",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"drop rows where any of 'col_a', 'col_b', 'col_c' are NA:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" col_a | \n",
" col_b | \n",
" col_c | \n",
" col_d | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1.0 | \n",
" aaa | \n",
" AAA | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 2.0 | \n",
" bbb | \n",
" BBB | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" col_a col_b col_c col_d\n",
"0 1.0 aaa AAA NaN\n",
"1 2.0 bbb BBB NaN"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(\"drop rows where any of 'col_a', 'col_b', 'col_c' are NA:\")\n",
"\n",
"df.dropna(\n",
" axis=\"index\",\n",
" how=\"any\",\n",
" subset=[\"col_a\", \"col_b\", \"col_c\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "2b962b5f",
"metadata": {},
"source": [
"\n",
"## Relabelling Columns and Indices\n",
"\n",
"Sometimes we’ll be dealing with data that is inconveniently named.\n",
"\n",
"Pandas provides an easy way to rename columns:\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bf125f64",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"original df:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" long_column_name | \n",
" short | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 5 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" long_column_name short\n",
"0 1 4\n",
"1 2 5\n",
"2 3 6"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"renamed df:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" long | \n",
" short | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 5 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" long short\n",
"0 1 4\n",
"1 2 5\n",
"2 3 6"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df = pd.DataFrame({\"long_column_name\": [1, 2, 3], \"short\": [4, 5, 6]}) \n",
"print(\"original df:\") \n",
"display(df) \n",
"\n",
"df = df.rename(columns={\"long_column_name\": \"long\"})\n",
"print(\"renamed df:\")\n",
"display(df)"
]
},
{
"cell_type": "markdown",
"id": "9da59c73",
"metadata": {},
"source": [
"\n",
"We can also modify indices in this fashion.\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f4664bce",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"original df:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" long | \n",
" short | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 5 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" long short\n",
"0 1 4\n",
"1 2 5\n",
"2 3 6"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"renamed df:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" long | \n",
" short | \n",
"
\n",
" \n",
" \n",
" \n",
" 10 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 5 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" long short\n",
"10 1 4\n",
"1 2 5\n",
"2 3 6"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(\"original df:\")\n",
"display(df)\n",
"\n",
"df = df.rename(index={0: 10})\n",
"print(\"renamed df:\")\n",
"display(df)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
},
"vscode": {
"interpreter": {
"hash": "bfb6be6a1e2928ff2d2ab7904db3c9037c4b5ca4f399bd5e7064365a54a3f091"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}