Wednesday, April 2, 2014

US prison population revisualized

The following graph caught my eye recently in my Twitter feed:

I've been debating whether to post about it (and finally decided that I couldn't resist).

I don't want to rip it apart.

Well, that's not entirely true.

I do want to rip it apart, but it's not in an effort to be mean. The above visual breaks pretty much every best practice out there when it comes to effective graph design. It's simple data. Probably not so much is being lost in terms of being able to interpret the data through this less-than-stellar data viz. But the specifics of the design choices (or lack thereof) drive me batty. To the extent that I can't help but comment and resolve to show what it has the potential to be.

First, let me list the main components that get under my skin (and I should note that it's possible some of these are constraints in the tool through which the above visual was published, which I have not used directly) :
  • No meaningful ordering to the data (rather, the categories are shown in reverse alphabetical order... not so helpful);
  • Lack of axis labels (sure, we can infer, but why should we have to?);
  • Diagonal text on x-axis (avoid, avoid, avoid!); and
  • Grey background, white vertical gridlines, and black bar outlines add unnecessary clutter (eliminate!).

I think the only positives I have to say about the original visual are: 1) a horizontal bar chart is a good choice here because we're dealing with categorical data with long category names, 2) good descriptive graph title, and 3) it makes me happy to see the data source listed (both as general good graph hygiene, as well as because it allows me to get to the source data to remake the visual).

Speaking of remaking the visual, here's what it could look like when we tackle the above issues:

If we just want to show the data, we could proceed with the above. Taking a cue from the original visual - a single point is labeled with its corresponding value: Drug Offenses - perhaps there is a story here worth highlighting. If that's the case, our visual might look something like the following:

Meta-lesson: if you're going to go through the effort of visualizing data, take the time to be thoughtful about your design choices!

If you're interested in the Excel version of the above makeovers, you can download it here.


  1. Excellent! Thanks for the post.

  2. Well Done Cole! You make my brain see graphs differently. I love you simple take on graphs. We tend to think more in terms of design. Now I think clean and what story are we trying to tell. Thank you for your insights!

  3. Cole, just a quick note to say how valuable I find these posts and want to thank you for your generosity in always providing the relevant files where available.

  4. Great remake of this graph and a great learning opportunity for me! I tried to guess which components you would change and how you would remake them (I'm not a new reader of your blog, after all!). I was pleased I got most of them right, but I did miss a few! For example, I didn't notice the bars were outlined in the original. My question is this: What is your take on eliminating axes in favor of adding data labels to each bar, etiher inside or outside the bar? I like to add them inside the end if all bars are long enough to accommodate the label; otherwise, I go outside. Is the choice, however, more about whether the reader needs to know the exact data point vs just get the big picture in terms of comparisons?

    1. Hi Sheila,

      Great question on retaining the axis and labels vs. labeling the data directly - I think it depends on how you want your audience to interact with the data. If the numbers really aren't what you want to draw attention to, but rather the relative comparison, I'd keep the axis but de-emphasize it (make it grey, push it to the background). Or if the numbers are key to understanding, I like labeling the bars directly.

      When it comes to labeling within or outside of the bar, for me the decision is usually driven by clutter. Or, more specifically, trying to reduce the perception of clutter. Every single component we add to our visual takes up some amount of cognitive load on the part of our audience - takes them brain power to process. If we label outside of the bars, the labels and bars are distinct elements, whereas if you label within the bar, the label actually becomes part of the bar element. You haven't actually changed the amount of info on the graph, but the perception of cognitive load on the part of your audience (whether conscious or not, probably the latter in this case) is reduced.

      That said, some of it just comes down to personal preference as well.

      Thanks for your comment!

    2. Great re-work of the chart Cole! In regards to the label inside/outside the bar, I have also seen a technique where the labels are moved all the way to the right, and lined up vertically. This made it very easy to scan down the list of numbers, and took some of the clutter away from the bars. As you said, it definitely depends on personal preference and the other elements in the chart.

      Thanks again!

  5. 'Cognitive load.' That is the best description I've heard of what is arguably the most common data viz problem. I love it, Cole!