{ "cells": [ { "cell_type": "markdown", "id": "36098caa-a551-4580-95c9-0082403965bb", "metadata": {}, "source": [ "# 2.2.2 Modifying Columns and Indices\n", "\n", "## Dropping Named Columns and Indices\n", "\n", "We may wish to drop columns or indices entirely from a Pandas DataFrame.\n", "\n", "This may be where we’ve already done some analysis and know the labels\n", "of the column(s) to drop. We use the\n", "[`DataFrame.drop`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html)\n", "method:\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "92ed98ca", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "original dataframe:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_acol_bcol_ccol_d
01.0aaaAAANaN
12.0bbbBBBNaN
2NaNcccCCCNaN
34.0dddNoneNaN
\n", "
" ], "text/plain": [ " col_a col_b col_c col_d\n", "0 1.0 aaa AAA NaN\n", "1 2.0 bbb BBB NaN\n", "2 NaN ccc CCC NaN\n", "3 4.0 ddd None NaN" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "drop ‘col_b’:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_acol_ccol_d
01.0AAANaN
12.0BBBNaN
2NaNCCCNaN
34.0NoneNaN
\n", "
" ], "text/plain": [ " col_a col_c col_d\n", "0 1.0 AAA NaN\n", "1 2.0 BBB NaN\n", "2 NaN CCC NaN\n", "3 4.0 None NaN" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display\n", "\n", "import pandas as pd \n", "import numpy as np\n", "\n", "# construct df with null values\n", "\n", "df = pd.DataFrame.from_dict({ \"col_a\": [1, 2, np.nan, 4], \"col_b\": [\"aaa\", \"bbb\", \"ccc\", \"ddd\"], \"col_c\": [\"AAA\", \"BBB\", \"CCC\", None],\"col_d\": [np.nan, np.nan, np.nan, np.nan], })\n", "\n", "# show the dataframe\n", "print(\"original dataframe:\") \n", "display(df)\n", "\n", "print(\"drop ‘col_b’:\") \n", "display(df.drop(columns=[\"col_b\"]))" ] }, { "cell_type": "markdown", "id": "af44c1ff", "metadata": {}, "source": [ "We can also use this method to drop indices:\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "7ee0bcd9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "drop rows 1 and 3\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_acol_bcol_ccol_d
01.0aaaAAANaN
2NaNcccCCCNaN
\n", "
" ], "text/plain": [ " col_a col_b col_c col_d\n", "0 1.0 aaa AAA NaN\n", "2 NaN ccc CCC NaN" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# show the dataframe\n", "print(\"drop rows 1 and 3\")\n", "display(df.drop(index=[1,3]))" ] }, { "cell_type": "markdown", "id": "ebc50d4b", "metadata": {}, "source": [ "\n", "## Dropping Columns and Indices with Nulls\n", "\n", "It may be that we wish to drop columns according to whether they have\n", "missing data. Here, we use the\n", "[`DataFrame.dropna`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)\n", "method:\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "3e3a8bd2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "original dataframe recap:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_acol_bcol_ccol_d
01.0aaaAAANaN
12.0bbbBBBNaN
2NaNcccCCCNaN
34.0dddNoneNaN
\n", "
" ], "text/plain": [ " col_a col_b col_c col_d\n", "0 1.0 aaa AAA NaN\n", "1 2.0 bbb BBB NaN\n", "2 NaN ccc CCC NaN\n", "3 4.0 ddd None NaN" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "drop columns where *all* values are NA:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_acol_bcol_c
01.0aaaAAA
12.0bbbBBB
2NaNcccCCC
34.0dddNone
\n", "
" ], "text/plain": [ " col_a col_b col_c\n", "0 1.0 aaa AAA\n", "1 2.0 bbb BBB\n", "2 NaN ccc CCC\n", "3 4.0 ddd None" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "drop columns where *any* values are NA:\n", "note that None is treated like NaN.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_b
0aaa
1bbb
2ccc
3ddd
\n", "
" ], "text/plain": [ " col_b\n", "0 aaa\n", "1 bbb\n", "2 ccc\n", "3 ddd" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# show the dataframe \n", "print(\"original dataframe recap:\")\n", "display(df)\n", "\n", "print(\"drop columns where *all* values are NA:\")\n", "display(df.dropna(axis=\"columns\", how=\"all\"))\n", "\n", "print(\"drop columns where *any* values are NA:\")\n", "print(\"note that None is treated like NaN.\") \n", "display(df.dropna(axis=\"columns\", how=\"any\"))" ] }, { "cell_type": "markdown", "id": "668a2fcc", "metadata": {}, "source": [ "\n", "Once again, we can use this method to drop indices:\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "03c20977", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "drop rows where any of 'col_a', 'col_b', 'col_c' are NA:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_acol_bcol_ccol_d
01.0aaaAAANaN
12.0bbbBBBNaN
\n", "
" ], "text/plain": [ " col_a col_b col_c col_d\n", "0 1.0 aaa AAA NaN\n", "1 2.0 bbb BBB NaN" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\"drop rows where any of 'col_a', 'col_b', 'col_c' are NA:\")\n", "\n", "df.dropna(\n", " axis=\"index\",\n", " how=\"any\",\n", " subset=[\"col_a\", \"col_b\", \"col_c\"]\n", ")" ] }, { "cell_type": "markdown", "id": "2b962b5f", "metadata": {}, "source": [ "\n", "## Relabelling Columns and Indices\n", "\n", "Sometimes we’ll be dealing with data that is inconveniently named.\n", "\n", "Pandas provides an easy way to rename columns:\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "bf125f64", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "original df:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
long_column_nameshort
014
125
236
\n", "
" ], "text/plain": [ " long_column_name short\n", "0 1 4\n", "1 2 5\n", "2 3 6" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "renamed df:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
longshort
014
125
236
\n", "
" ], "text/plain": [ " long short\n", "0 1 4\n", "1 2 5\n", "2 3 6" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df = pd.DataFrame({\"long_column_name\": [1, 2, 3], \"short\": [4, 5, 6]}) \n", "print(\"original df:\") \n", "display(df) \n", "\n", "df = df.rename(columns={\"long_column_name\": \"long\"})\n", "print(\"renamed df:\")\n", "display(df)" ] }, { "cell_type": "markdown", "id": "9da59c73", "metadata": {}, "source": [ "\n", "We can also modify indices in this fashion.\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "f4664bce", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "original df:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
longshort
014
125
236
\n", "
" ], "text/plain": [ " long short\n", "0 1 4\n", "1 2 5\n", "2 3 6" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "renamed df:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
longshort
1014
125
236
\n", "
" ], "text/plain": [ " long short\n", "10 1 4\n", "1 2 5\n", "2 3 6" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\"original df:\")\n", "display(df)\n", "\n", "df = df.rename(index={0: 10})\n", "print(\"renamed df:\")\n", "display(df)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" }, "vscode": { "interpreter": { "hash": "bfb6be6a1e2928ff2d2ab7904db3c9037c4b5ca4f399bd5e7064365a54a3f091" } } }, "nbformat": 4, "nbformat_minor": 5 }