Thursday, September 20, 2012

bar charts must have a zero baseline

This is one rule of data visualization that I see broken too often: when it comes to bar charts, the y-axis must begin at zero.

When our eyes interpret bar charts, we are comparing the relative heights of the bars. When we cut the height off at something greater than zero, it skews this visual comparison, over-emphasizing the difference between the bars in a way that simply isn't honest. Most recently, I saw this in a visual that was forwarded by a friend of a colleague. The offender: Fox News.

There are a number of things that bother me about this visual. Beyond the unnecessary visual clutter of tiny gridlines and strange chart borders, the y-axis isn't labeled (I think it's Top Tax Rate, as noted by the subtitle, but this would be a lot clearer if the axis itself were labeled) and it is placed on the right-hand size of the visual, so it's the last thing I see as my eyes scan across from left to right, making it even less likely that I see the biggest issue with the graphic, the fact that the y-axis starts at 34%. This makes the difference between Now (35%) and Jan 1, 2013 (39.6%) appear to be way bigger than it actually is.

How big of an issue is this? Let's do some math to find out. The way it's graphed, the height of the bars are 1 (35-34) and 5.6 (39.6-34). This represents a visual increase of 460% ((5.6-1)/1). If we graph the bars with a zero baseline so that the heights are accurately represented - 35 and 39.6 we get a visual (and actual) increase of 13% ((39.6-35)/35). Perhaps that is still significant and that is the point that Fox News was attempting to make. That's fine, but I wish they would have done it without this visual misrepresentation of the truth.

A couple related things to consider (and I have my own opinion on each of these that I'll of course make clear):
  • I've heard the argument that if you're graphing something that has a sort of "natural" baseline of something greater than zero, then it might be appropriate to start with that. For example, if we consider the baseline unemployment rate to be 5%, then the argument goes that you could use this 5% as the baseline. I don't like it. For me, it isn't a valid visual comparison, so if that were the case, I'd use a different way to show it (perhaps plot the entirety of the bars but then also highlight 5% horizontal line and label it in a way that makes it clear how to use it for comparison).
  • When it comes to line graphs, the zero baseline rule does not hold. In other words, you can get away with a non-zero baseline in a line graph. With line graphs, we compare the lines to each other more than their height from the x-axis. Still, you need to be careful. I would advise to make it clear to your audience that you're using a non-zero baseline so they interpret the information correctly (one approach: label the y-axis and highlight the minimum value in bold so attention is drawn that it's something other than zero). And you need to be careful about zooming in too much and making a change that is minor look big - this gets you back into the visual misrepresentation place that we want to avoid.
My advice to Fox News (and to those communicating with data in general) would be to first determine the story you want to tell. Then determine what data will best support this story. Don't compel your audience with visual misrepresentations; rather, convince them with accurately displayed data that backs up the point you are trying to make.

