3.3 Plotting with Matplotlib#

Estimated time to complete this notebook: 25 minutes

3.3.1 Importing Matplotlib#

We import the pyplot object from Matplotlib, which provides us with an interface for making figures. We usually abbreviate it.

from matplotlib import pyplot as plt

3.3.2 Notebook magics#

When we write:

%matplotlib inline

We tell the Jupyter notebook to show figures we generate alongside the code that created it, rather than in a separate window. Lines beginning with a single percent are not python code: they control how the notebook deals with python code.

Lines beginning with two percent signs are “cell magics”, that tell Jupyter notebook how to interpret the particular cell; we’ve seen %%writefile and %%bash for example.

3.3.3 A basic plot#

When we write:

from math import cos, pi, sin

myfig = plt.plot([sin(pi * x / 100.0) for x in range(100)])

The plot command returns a figure, just like the return value of any function. The notebook then displays this.

To add a title, axis labels etc, we need to get that figure object, and manipulate it. For convenience, matplotlib allows us to do this just by issuing commands to change the “current figure”:

plt.plot([sin(pi * x / 100.0) for x in range(100)])
Text(0.5, 1.0, 'Hello')

But this requires us to keep all our commands together in a single cell, and makes use of a “global” single “current plot”, which, while convenient for quick exploratory sketches, is a bit cumbersome. If we want to produce publication-quality plots from our notebook, matplotlib, defines some types we can use to treat individual figures as variables, and manipulate these.

3.3.4 Figures and Axes#

We often want multiple graphs in a single figure (e.g. for figures which display a matrix of graphs of different variables for comparison).

So Matplotlib divides a figure object up into axes: each pair of axes is one ‘subplot’. To make a boring figure with just one pair of axes, however, we can just ask for a default new figure, with brand new axes. The relevant function returns a (figure, axis) pair, which we can deal out with parallel assignment.

sine_graph, sine_graph_axes = plt.subplots()

Once we have some axes, we can plot a graph on them:

sine_graph_axes.plot([sin(pi * x / 100.0) for x in range(100)], label="sin(x)")
[<matplotlib.lines.Line2D at 0x7f3edccb4eb0>]

We can add a title to a pair of axes:

sine_graph_axes.set_title("My graph")
Text(0.5, 1.0, 'My graph')
Text(4.444444444444445, 0.5, 'f(x)')
sine_graph_axes.set_xlabel("100 x")
Text(0.5, 4.444444444444445, '100 x')

Now we need to actually display the figure. As always with the notebook, if we make a variable be returned by the last line of a code cell, it gets displayed:


We can add another curve:

sine_graph_axes.plot([cos(pi * x / 100.0) for x in range(100)], label="cos(x)")
[<matplotlib.lines.Line2D at 0x7f3edcc49ee0>]

A legend will help us distinguish the curves:

<matplotlib.legend.Legend at 0x7f3f00fe77f0>

3.3.5 Saving figures#

We must be able to save figures to disk, in order to use them in papers. This is really easy:


In order to be able to check that it worked, we need to know how to display an arbitrary image in the notebook.

The programmatic way is like this:

# Use the notebook's own library for manipulating itself.
from IPython.display import Image


3.3.6 Subplots#

We might have wanted the \(\sin\) and \(\cos\) graphs on separate axes:

double_graph = plt.figure()
<Figure size 640x480 with 0 Axes>
sin_axes = double_graph.add_subplot(2, 1, 1)  # 2 rows, 1 column, 1st subplot
cos_axes = double_graph.add_subplot(2, 1, 2)
sin_axes.plot([sin(pi * x / 100.0) for x in range(100)])
[<matplotlib.lines.Line2D at 0x7f3edc9d91c0>]
Text(4.444444444444445, 0.5, 'sin(x)')
cos_axes.plot([cos(pi * x / 100.0) for x in range(100)])
[<matplotlib.lines.Line2D at 0x7f3edccb4f10>]
Text(4.444444444444445, 0.5, 'cos(x)')
cos_axes.set_xlabel("100 x")
Text(0.5, 4.444444444444445, '100 x')

3.3.7 Versus plots#

When we specify a single list to plot, the x-values are just the array index number. We usually want to plot something more meaningful:

double_graph = plt.figure()
sin_axes = double_graph.add_subplot(2, 1, 1)
cos_axes = double_graph.add_subplot(2, 1, 2)
Text(0.5, 0, 'x')
    [x / 100.0 for x in range(100)], [sin(pi * x / 100.0) for x in range(100)]
    [x / 100.0 for x in range(100)], [cos(pi * x / 100.0) for x in range(100)]
[<matplotlib.lines.Line2D at 0x7f3edc5819d0>]

3.3.8 Sunspot Data#

We can incorporate what we have learned in the sunspots example to produce graphs of the data.

import pandas as pd

df = pd.read_csv(
    names=["year", "month", "date", "mean", "deviation", "observations", "definitive"],
year month date mean deviation observations definitive
0 1749 1 1749.042 96.7 -1.0 -1 1
1 1749 2 1749.123 104.3 -1.0 -1 1
2 1749 3 1749.204 116.7 -1.0 -1 1
3 1749 4 1749.288 92.8 -1.0 -1 1
4 1749 5 1749.371 141.7 -1.0 -1 1

We can plot all the data in the dataframe separately, but that isn’t always useful!

array([<Axes: >, <Axes: >, <Axes: >, <Axes: >, <Axes: >, <Axes: >,
       <Axes: >], dtype=object)

Let’s produce some more meaningful and useful visualisations by accessing the dataframe directly.

We start by discarding any rows with an invalid (negative) standard deviation.

df = df[df["deviation"] > 0]

Next we use the dataframe to construct some useful lists.

deviation = df["deviation"].tolist()  # Get the dataframe column (series) as a list
observations = df["observations"].tolist()
mean = df["mean"].tolist()
date = df["date"].tolist()
fig = plt.figure(
    figsize=(15, 10)
)  # Set the width of the figure to be 15 inches, and the height to be 5 inches

ax1 = fig.add_subplot(2, 2, 1)  # 2 rows, 2 columns, 1st subplot
    df["date"],  # Date on the x axis
    df["mean"],  # Mean on the y axis
    yerr=df["deviation"],  # Use the deviation for the error bars
    color="orange",  # Plot the sunspot (mean) data in orange
)  # Show the error bars in black
ax1.set_title("From Dataframe")

ax2 = fig.add_subplot(2, 2, 2)  # 2 rows, 2 columns, 2nd subplot
ax2.scatter(df["date"], df["observations"], color="grey", marker="+")
ax2.set_ylabel("Number of Observations")
ax2.set_title("From Dataframe")

ax3 = fig.add_subplot(2, 2, 3)  # 2 rows, 2 columns, 3rd subplot
ax3.errorbar(date, mean, yerr=deviation, color="pink", ecolor="black")
ax3.set_title("From List")

ax4 = fig.add_subplot(2, 2, 4)  # 2 rows, 2 columns, 4th subplot
ax4.scatter(date, observations, color="red", marker="o")
ax4.set_ylabel("Number of Observations")
ax4.set_title("From List")
Text(0.5, 1.0, 'From List')

In this example we are plotting columns from the pandas DataFrame (series), and from lists to show this method works for both. numpy arrays can also be used.

3.3.9 Learning More#

There’s so much more to learn about matplotlib: pie charts, bar charts, heat maps, 3-d plotting, animated plots, and so on. You can learn all this via the Matplotlib Website. You should try to get comfortable with all this, so please use some time in class, or at home, to work your way through a bunch of the examples.