3.1 Figures gone wrong
Contents
3.1 Figures gone wrong¶
The most common way for data scientists to convey and present their findings and ideas is through graphs and figures. Modern software libraries have made figure generation as easy as it has ever been. However, this has made it all together too easy for data scientists to:
Make misleading figures
Choose a wrong presentation format
Use overly complex, but attractive, designs which muddle the message
and so, creating good plots is hard.
Here we’ve collated examples of figures that do a poor job of communicating the data. As we go through these examples we will have a brief discussion about each figure (you can find many more examples at @GraphCrimes on Twitter).
Example 1¶
Example 2¶
Here data has been selectively plotted to exaggerate an idea. There are a number of issues with this plot:
We only have two data points for each trend, and it is practically impossible that the trend would be a perfect straight line. Rather, any trend would fluctuate year-on-year, this proposed difference between 2008 and 2013 may be within the bounds of a noisy signal.
If you look at the numbers there are clearly two axes at play, or at least no attempt has been made to represent the magnitude of the change realistically.
Plotting these two data sources together here suggests there is a relationship between them (i.e., that the company Planned Parenthood has redistributed resources from cancer screening to abortions). It is an ethical responsibility for any figure creator to ensure that your figure is not implying a conclusion that isn’t supported by your data.
To illustrate how misleading this original figure is, here is an example of the same data visualised in a more responsible manner.
Example 3¶
Example 4¶
Example 5¶
Both these plots are examples from Factfulness by Hans Rosling, they show that using only averages means that the figure does not communicate all the information, such as the spread of the distribution. The kernel density estimates (more on those in Section 3.3) below show a slice of one particular year, offering complementary information that allows us to interpret the averages in context.
Example 6¶
Example 7¶
Example 8¶
Example 9¶
Additionally, pie charts are almost always the worst%20are%20tricky%20to%20show.) form of presenting data and should be avoided at all costs.