Wednesday, June 4, 2014

alternatives to pies

My disdain for pie charts is well documented. While opinions on their usefulness run the gamut, I am certainly not alone in my contempt. In my workshops, I sometimes get the question, "In what situation would you recommend a pie chart?" For me, the answer is never.* There are a number of alternatives, each with their own benefits. It's these alternatives that I'll focus on in this post.

*Full disclosure: There was once a situation at Google where we wanted to share some diversity stats on gender breakdown but didn't want to show the specific values. In this case, the fact that it's tough for people to attribute accurate value to 2-dimensional space worked to our advantage and we leveraged a pie chart absent of any value labels. Though, now that Google is sharing their diversity stats publicly (I'll resist the urge to comment on the ill-chosen donut graphs they are using to do so) it seems even this has become a moot need.

The following is an example that I often use in my workshops (based on a real example, but modified a bit to preserve confidentiality). By way of context: imagine you just completed a pilot summer learning program on science aimed at improving perceptions of the field among 2nd and 3rd grade elementary children. You conducted a survey going into the program and at the end of the program and have visualized the resulting data in the following set of graphs.


I believe the above data demonstrates that, on the basis of improved sentiment towards science, the pilot program was a great success. Going into the program, the biggest segment of students (40%, the green slice in the left pie) felt just "OK" about science - perhaps they hadn't made up their minds one way or the other. Whereas after the program (pie on the right), that 40% in green shrinks down to 14%. Bored (blue) and Not great (red) went up a percentage point each, but the majority of the change was in a positive direction: after the program, nearly 70% of kids (purple + teal segments) expressed some level of interest towards science.

The above visual does this story a great disservice. Yes, you can get there, but you have to first overcome the annoyance of trying to compare slices across two pies. There's no need for this annoyance: choose a different type of visual!

Let's take a look at four alternatives using the above data.

Alternative #1: Show the Number(s) Directly
If the improvement in positive sentiment is the big thing we want to communicate, we can consider making that the only thing we communicate:
Too often, we think we have to include all of the data and overlook the simplicity and power of communicating with just one or two numbers directly, as in the above. That said, if you feel you need to show more, look to one of the following alternatives.

Alternative #2: Simple Bar Graph
When you want to compare two things, you typically want to put those two things as close together as possible and align them along a common baseline to make this comparison easy. The simple bar graph does this. This is the "after" version that I typically use in my workshops (which is why you see more narrative integrated into the following visual than the other alternatives).

Alternative #3: 100% Stacked Horizontal Bar Graph
When the part-to-whole concept is a must-have (something you don't get with either of the above solutions), the stacked 100% horizontal bar graph achieves this. Note that you get a consistent baseline to use for comparison both at the left and at the right of the graph, which can be useful in cases such as this, allowing the audience to easily compare both the negative segments at the left and the positive segments at the right across the two bars. Because of this, I find this to be a useful way to visualize survey data in general.

In the above version I chose to retain the x-axis labels rather than put data labels on the bars directly. I tend to do it this way when leveraging 100% stacked bars so that you can use the scale at the top to read either from left to right (which in this case allows us to attribute numbers to the change from Before to After on the negative end of the scale) or from right to left (to do the same for the positive end of the scale). In the simple bar graph shown previously, I chose to omit the axis and label the bars directly. This illustrates how different views of your data may lead you to different design choices. Always think about how you want your audience to use the graph and make your design choices accordingly - different choices will make sense in different situations.

Alternative #4: Slopegraph
The final alternative we'll consider today is a slopegraph (I've blogged about slopegraphs previously here, here, and here). As was the case with the simple bar chart, you don't get a clear sense of there being a whole and thus pieces-of-a-whole in this view (in the way that you do with the initial pie, or with the 100% horizontal stacked bar). Also, if it is important to have your categories ordered in a certain way, a slopegraph won't always be ideal since the various categories are placed according to the respective data values (in the following, on the right hand side, you do get the positive end of the scale at the top but note that Bored and Not great at the bottom are switched relative to how they'd appear in an ordinal scale because of the values that correspond with this points - if you need to dictate the category order, use the simple bar graph or the 100% stacked bar graph where you can control this).

One thing you do get with the slopegraph is the visual percent change from Before to After for each category via the slope of the respective line. It's easy to see quickly that the category that increased the most was Excited (and the category that decreased markedly was OK). The slopegraph also provides clear visual ordering of categories from greatest to least (via their respective points in space from top to bottom on the left and on the right sides of the graph).

Any of these alternatives might be the best choice given the specific situation, how you want your audience to interact with the information, and what point(s) of emphasis you want to make. The meta-lesson here is that you have a number of of alternatives to pies that can be more effective for getting your point across.

I should note that I had a couple specific sources of inspiration for this post. I recently completed some long overdue reading that included Jon Schwabish's An Economist's Guide to Visualizing Data. In it, Jon discusses a number of data viz best practices through examples of common mistakes and some nice makeovers, including a section focused on alternatives to pies. I highly recommend checking out this paper. Andy Kriebel recently posted a nice makeover of a particularly annoying "data visualization" that tried to combine pie graphs with faces (you have to see it to believe it). There are a few things that are worse than a pie graph: a 3D exploding pie graph, having to compare segments across two pie graphs, and - a recent (and unexpected) addition to the list - the face-pie.

The Excel workbook with the above makeovers can be downloaded here.

Are there other alternatives to pies that should be added to this list? Which one do you favor in this situation? Leave a comment with your thoughts!

22 comments:

  1. I vote for the horizontal stacked bar chart.

    ReplyDelete
  2. I think the slope graph is less effective than the stacked bar. In the stacked bar, the elements are stacked in a logical order (bored to excited) which aids in understanding and gives it an edge over the slope graph.

    ReplyDelete
    Replies
    1. Jon, I agree with you in this case, but I think that it works only because there are only three categories of interest ("don't like," "like" and "meh"). If there were more categories, such as religious preference or spending categories, then we'd have two categories that we could compare at the ends and a bunch of categories in the middle that would get muddled. I think the slope graph generally performs better.

      Delete
  3. Hi Cole,

    Nice post about the limitations of pie charts. I particularly liked the comment of using both edges of the horizontal bar chart. While comparisons may be hard to make for the segments in the middle, I think it still conveys the overall message that interest in the sciences increased.

    On a separate note, have you ever come across an effective use of pie charts? In particular, one that focuses on part to whole relationships rather than multiple comparisons.

    I came across this blog post on Nathan Yau's Flowing Data a while back, and I'm wondering what your thoughts were on the WSJ's use of the pie chart.

    Nathan Yau's post - http://flowingdata.com/2012/05/19/good-use-of-pie-charts/

    WSJ Image - http://online.wsj.com/news/interactive/FACEBOOKIPOPROFITS3_JPG20120516_pg_2?ref=SB10001424052702303448404577407774136362662

    I'm going to try to reimagine the visualization this weekend and would love to hear your thoughts.

    ReplyDelete
    Replies
    1. Hi Chris,

      Thanks for your note and the link to Yau's post. I'd agree that this is a less-offensive use of pies compared to many out there, but for me personally: any pie is a bad pie.

      At first glance, I thought the numbers themselves might be enough. But after looking more closely, if you really want people to be able to compare previous to current on the left (and see the lack of change for those listed on the right), a slopegraph may also work well here.

      I'd love to see your makeover!

      Delete
    2. I've got a make over using bar charts and slope graphs. I don't think the slope graph is as effective given crowding and some overlapping data values.

      What's the best way to share the excel file?

      Delete
    3. Hi Chris,

      You can upload to Google Docs or Dropbox and put a link to download the file in the comments, or if you want to email the Excel file to me directly, I can also do this.

      Delete
    4. Here's a link to the Excel file in dropbox. Let me know if you have any trouble accessing the data.

      https://dl.dropboxusercontent.com/u/29116501/wsjPieMakeover.xlsx

      Do you have any advice or resources for learning about typefaces and choosing appropriate fonts? I know a little about serif and sans-serif fonts but am often uncertain about appropriate font choices.

      Delete
    5. Hi Chris,

      Thanks for sharing your makeover! I like the slopegraph version.

      Funny you should ask - I've just been starting to do some research about typefaces and fonts, but don't have much to share on that front yet. Let me know if you come across any good books in this area.

      Delete
    6. I asked our design team. They offered the following advice.

      For data visualizations:
      "Use sans serif, clean, and no decorative fonts. Neutrality is the goal. Personal preference is toward narrower fonts so the data has more room to shine.

      Another note on font selection: pay attention to how the numerals are written. It's rarer for sans serif, but some fonts will have numbers that dip down or stick out like Georgia. It's pretty, but you probably want to avoid that for labeling visualizations."

      I usually go for Gill Sans, and sometimes use a serif to make text labels more readable and different from the numerical labels. I think I need to do more research and experiment more. I suspect fonts like Georgia draw the eye toward axes labels rather than the encoded data. These types of fonts also create misalignment among numerical labels which could be distracting.

      I'm still on the hunt for books related to typeface and data vis.

      best,
      Chris

      Delete
  4. Cole, have you ever discussed pie charts with nonprofits who use them to show the distribution of expenses (and sometimes income as well), particularly to illustrate the ratio of program expenses to admin and fundraising? The arguments I've heard in favor of pie charts are 1) they are effective in showing proportions when there are only 3 or 4 components, and 2) everyone is using them. Thoughts or ideas?

    ReplyDelete
    Replies
    1. Hi Michele,

      Thanks for your comment. Yes, I've had this discussion before. People tend to be hesitant to move away from what they've "always used," but that doesn't mean there isn't a better approach! It's part of the change management process. Sometimes, showing the alternate view that's as good (or better) can help convince those who are resistant.

      While the concept of a pie chart and the part-to-whole idea you get with them is pervasive (and why I imagine people think they are effective), our eyes don't do a good job attributing quantitative value to 2-dimensional space, which means that they actually are hard to read. This is true even when you have only 3-4 segments. If values are very different, you can pick out the big and small pieces, but it's nearly impossible to say how much bigger or smaller they are than the other segments (something that becomes even more challenging when the segments are close in value).

      In this case, I think the horizontal 100% stacked bar chart could work well. If you'd be willing to share a specific example, I'd be happy to take a look at it - you can email me directly at cole.nussbaumer@gmail.com.

      Delete
  5. Great post!

    I tend to favor a stacked horizontal over a pie if I'm making comparisons. The slopegraph could work if your intention is to just show how much something changed. Haven't really played with it before though.

    ReplyDelete
  6. I love this example because it highlights what is so wrong with so many data presentations. Those of us that create charts every day for work (especially via ppt) believe that we need to show our work to gain credibility and prove the conclusion of our analysis. Your first example, Alternative #1: Show the Number(s) Directly, tells the whole story clearly and concisely. Your format of the content makes the relevant portions pop and it would do well on a ppt presentation. But some believe you need to fill a page with data points, colored lines, circles, bars, and connectors when a single bold statement sells the the whole concept. KISS. When comparative analysis is necessary, a well executed chart speaks volumes (as Cole has shown in numerous posts).

    In addition, if you require deeper analysis of the data, you would require all the inputs. It is important that you clearly communicate the conclusions drawn from the data, that you can empirically defend those conclusions if challenged, and that your reader or observer easily understands and draws the same conclusions from your presentation. You can always provide alternative charts as backup or even the raw data to those interested in analyzing it themselves. If you can stand behind your analysis, you shouldn't have to show your work to prove your point. I think in this case, simplicity rules! Well stated.....

    ReplyDelete
  7. Yes, I'm with you on the pie chats, though Nathan Yau's post shows some nice examples. As far as the above data goes, there are probably some other ways to layouts that would communicate the data faster. Showing stacked area line charts without the "ok" data shows the movement nicely which is lost in simple stack bars.

    https://flic.kr/p/nQtdW7

    ReplyDelete
  8. Great discussion. I prefer the 100% bar stacked chart. I liked the use of the same colours for "good" and "bad" categories rather than the typical separate colour for each.

    ReplyDelete
  9. I personally always reach for the horizontal bar charts, the closer physical association between the two bars allows the viewer to recognise the patter of conformance (or not) to the over data set. Much visually easier than your eyes having to flick from one object to the next and 'losing' the relationship.

    I've also used slope graphs, particularly when the disparity between the absolute values of the data is great. The distortion of the slopes is not so critical if all you are trying to say is "This one up a bit, this one down lot" etc I always include the value points in such a case for accuracy, as Cole has done in her example above.

    ReplyDelete
  10. Cole, thanks for coming with one example when the pie charts reign: when you want to obfuscate rather than clarify the ratios. I agree that it is difficult to come with another where a different type of chart would not work better to tell the story.

    Mathematician Alexey Krylov said "“Statistics should consist not only in filling the register the size of a double bed sheet with useless numbers but in reducing these numbers to a quarter of a page and in relating them to one another, making it possible to see what happened and to anticipate what is going to happen."

    ReplyDelete
  11. Apropos of the above .. I just came across this today, totally different orders of magnitude in the data set, but it works for me. Although I would have labelled the lines and deleted the key for better clarity, but the fudging of the vertical scales doesn't bother me in this context. Other's thoughts ?

    http://twitpic.com/e69uft/full

    ReplyDelete
  12. I think that the profligate use of pie charts has somehow convinced us that plots of percentages should visually show those parts summing to 100%. We should re-think this. We tend to make bad trade-offs in design. If it matters, then a simple note would suffice, and in cases where we want to show the change in the sum, we can just plot the sum as a separate graph above and mirroring the plot of parts.

    Still, if we really want the parts to sum to a whole, the mosaic plot is a nice alternative to the bar chart and the stacked bar chart. I particularly like interactive versions for exploring data.

    I'm somewhat surprised that the dot plot didn't make the list (Cleveland's version; not the one described by Wilkinson or that implemented in Minitab). Since our brains are hard-wired to measure linear distances but not angles or areas, dot plots provide a more accurate representation of the data than pie charts or mosaic plots (hence the phenomenon that fast-moving objects appear slower when far away and fast when close). Dot plots also scale well when there are many categories (i.e. more than in the sample data Cole uses) and when comparing categories across other groupings (e.g. "before" and "after").

    ReplyDelete
  13. Tom makes a good point about dotplots. If you use R, look for Trellis graphics (they are in the style of William Cleveland's graphs from "The Elements of Graphing Data" and "Visualizing Data").

    It's hard to format in this small box, but here are a few inline approaches that would benefit from some scales, alternate fonts/characters, and color.
    Excited o------------------+
    Kinda o----+
    OK +--------------------------o
    Bored -+
    NotGreat -+

    Deltas (after - before) from Excited (top) through Not Great (bottom). Using dashes to fill in for blank spaces.
    ----------------------------------------------------[][][][][][][][][][][][][][][][][][][]
    ----------------------------------------------------[][][][][]
    [][][][][][][][][][][][][][][][][][][][][][][][][][]
    ----------------------------------------------------[]
    ----------------------------------------------------[]

    Deltas for Excited and Kind of Interested (+24) vs. Bored and Not Great (+2). The OK group's decline is reflected in the increase in the Excited and Kinda Interested groups:
    [][][][][][][][][][][][][][][][][][][][][][][][] -- Excited, Kind of Interested
    [][] -- Bored, Not great

    Thanks,
    Joe

    ReplyDelete