3.1 Figures gone wrong

The most common way for data scientists to convey and present their findings and ideas is through graphs and figures. Modern software libraries have made figure generation as easy as it has ever been. However, this has made it all together too easy for data scientists to:

  1. Make misleading figures

  2. Choose a wrong presentation format

  3. Use overly complex, but attractive, designs which muddle the message

and so, creating good plots is hard.

Here we’ve collated examples of figures that do a poor job of communicating the data. As we go through these examples we will have a brief discussion about each figure (you can find many more examples at @GraphCrimes on Twitter).

Example 1

Source

Example 1 notes This figure is a classic example of misleading information. The axes does not start at zero so the effect is visually exaggerated, and we are not given the uncertainty around these averages. The reader is therefore unable to interpret the practical significance of any differences between groups.

Example 2

Source

Example 2 notes

Here data has been selectively plotted to exaggerate an idea. There are a number of issues with this plot:

  • We only have two data points for each trend, and it is practically impossible that the trend would be a perfect straight line. Rather, any trend would fluctuate year-on-year, this proposed difference between 2008 and 2013 may be within the bounds of a noisy signal.

  • If you look at the numbers there are clearly two axes at play, or at least no attempt has been made to represent the magnitude of the change realistically.

  • Plotting these two data sources together here suggests there is a relationship between them (i.e., that the company Planned Parenthood has redistributed resources from cancer screening to abortions). It is an ethical responsibility for any figure creator to ensure that your figure is not implying a conclusion that isn’t supported by your data.

To illustrate how misleading this original figure is, here is an example of the same data visualised in a more responsible manner.

Source

Example 3

Example 3 notes Example 3 is very confusing because the scale of the x-axis is altered mid-plot. Even after reading the caption it is very difficult to get a feel for the data because the grid-lines give a powerful signal of uniformity.

Example 4

Example 4 notes The phenomenon that this xkcd comic is getting at is Normalisation. Normalisation is where you alter a scale to be between zero and one (usually). Failure to normalise is when different measurements that themselves have different scales are plotted on the same scale. The comic shows the classic failure to normalise measurements by population. A variable that was consistently 10% of the population would appear to vary across the map.

Example 5

Example 5 notes

Both these plots are examples from Factfulness by Hans Rosling, they show that using only averages means that the figure does not communicate all the information, such as the spread of the distribution. The kernel density estimates (more on those in Section 3.3) below show a slice of one particular year, offering complementary information that allows us to interpret the averages in context.

Example 6

Source

Example 6 notes This figure does not appear to have been created to clearly communicate data. There is far too much information on this figure to be able to intuitively grasp the message. A reader should be able to understand a figure quickly.

Example 7

Source

Example 7 notes This figure was presented quickly in a UK Government Covid briefing. In such a situation rapid and clear comprehension of figures is paramount. Here this figure is guilty of over-plotting, which is attempting to squeeze too much information into one figure, ultimately rendering it difficult to comprehend.

Example 8

Source

Example 8 notes Although you can see the relative pattern of the two signals, there is no y-axis! One can therefore not assess the magnitude of the difference.

Example 9

Source

Example 9 notes Though this pie chart represents the data well there are confusing aesthetic choices. We have redundant information: there is really no need for the legend since all the slices of the pie chart are labelled anyway.

Additionally, pie charts are almost always the worst%20are%20tricky%20to%20show.) form of presenting data and should be avoided at all costs.

Example 10

Source

Example 10 notes Similarly to Example 10, the aesthetics of this graph detract from communicating the data. The y axis could be represented on a scale of millions (i.e., from 0 to \$500), and there is no need to note all the data points. If one wants to write out the data, use a table!