Thursday, March 14, 2013

strategies for avoiding the spaghetti graph

It seems that I have a distaste for any chart type that has food in its title. My hatred of pie charts is well documented. Donuts are even worse. Here's another to add to the list: the spaghetti graph. Haven't seen one before? Oh, but surely, you have. They look something like this:


They are referred to as the spaghetti graph (by me, at least) because they look like someone took a handful of uncooked spaghetti noodles and threw them on the ground. And they are about as informative as such an action would be...

...which is to say

not at all.

There are a few strategies for taking the would-be-spaghetti graph and creating more visual sense of the data. Two such strategies that I've employed (there are certainly more) are 1) separating the lines spatially and 2) using preattentive attributes to emphasize one line at a time, while still leaving the others there for comparison. A third strategy could be a combination of these first two. I'll discuss these three approaches and show you some examples in the following. Caveat: the second and third approach I'll cover do have some redundancy of information, but it's not clear to me that's necessarily a bad thing (though if this bothers you a great deal, you may want to stop reading here).

Let's look at an example of each of these approaches.

Separating spatially
We can pull the lines apart vertically and give each its own graph (but mash the graphs together so they still appear to be a single visual):


It's important in the above example that my y-axis minimum and maximum are the same for each graph so that the reader can compare the relative position of each line/point within the given area.

Note that this approach assumes that being able to see the trend for a given category is more important than comparing it to the other categories - you can still do this latter comparison, but it isn't as easy visually because of the way the lines have been separated.

Emphasizing one line at a time
Another approach would be to have multiple graphs, where you plot all of the data on each but highlight a single trend at a time. Here's an example (note that this and the following example graph different data than above):


In this case, you can see each trend on it's own, but also have the others there in the background for reference. Here, I've emphasized the 2012 figures by including a marker and the data label and organized the charts from highest to lowest 2012 budget.

Combined approach
A third approach could be a combination of the above two:


Personally, this is my favorite for this particular data (I originally tried approach 1 here, but it was really hard to compare any given trend to the others, which wasn't ideal in this case).

In any event, if you find yourself facing a spaghetti graph, don't stop there. Think about what information you want to most convey, what story you want to tell, and what changes to the visual could help you accomplish that effectively. Perhaps the above examples will give you some ideas. If you're interested in the Excel file with the examples above, you can download it here.

Do you have other strategies for avoiding the spaghetti graph? Feedback on the above? Leave a comment with your thoughts!

8/25/14 update: check out this Washington Post article for a nice example of emphasizing one line at a time (via @jschwabish)

10 comments:

  1. Any thoughts on Social Network Analysis graphs? I think they're often referred to as hairballs, but the criss-crossing nature of the lines in the spaghetti graph made me wonder what you thought of SNA visuals. I've been always been intrigued by them and lately they seem increasing available/accessible via LinkedIn, WolframAlpha-Facebook, NodeXL, etc.

    ReplyDelete
    Replies
    1. Yes. Remove yourself and all the lines from you to remove that hair from the hairball. Aggregate into clumps the social groups where (almost) everyone knows everyone. This will reveal the interesting inter-clump connections. Consider using hive plots. Consider encoding the entities with some other useful value beyond affiliation and number of connections (how long you've known them? how often you contact them?) Consider applying axes of some kind to make the layout meaningful. These tips should go a long way towards making your graphs more interesting and useful.

      I may have said more interesting and relevant things at the end of this talk: http://youtu.be/R-oiKt7bUU8 Please let me know if you end up building such a tool; I've been asking for it for years. :-)

      Best, Noah

      PS: Nice post Cole.

      Delete
    2. You also might want to take a look at BioFabric (www.BioFabric.org) as a way to look at large SNA graphs without the hairball. BioFabric represents nodes as horizontal lines, so the edges can be shown unambiguously as parallel vertical line segments.

      Delete
  2. I know at least one publishing style guide that recommends line graphs be limited to just four lines, to avoid the tangle. But sometimes they have to tangle a little so that the intersection points can be seen clearly. In those cases, I've used both color and direct line labeling (another thing you promote) with success in untangling.

    ReplyDelete
  3. For this sample data situation, I would use something like: http://postimage.org/image/fidqm143j/

    Things we can see with this:
    - the context
    - each individual line
    - Cat 2 had a massive jump
    - Cat 4 and 6 are similar
    - there are three tiers: Cat 1-2, Cat 3-6, and Cat 7-10.

    Here are two Arrow Chart versions:
    http://postimage.org/image/9bf2jt3sp/
    http://postimage.org/image/4qsw4vk3d/
    These views highlight the change, and we can see there are only a few decreases in budget.

    Anand, you may be interested in Hive Plots: http://www.hiveplot.net/

    ReplyDelete
  4. There's an implicit solution you used but you don't talk too much about: Censoring out data that doesn't support the message. In this case, data prior to 2010.

    As a rule, I dislike censoring data, and approaches 2 and 3 just seem really inefficient to me, so I prefer Approach 1 (which is pretty close to Tufte's sparklines). It still has all the problems you mentioned, though.

    I would very much favor Approach 3 if you want to contrast one group against many.

    ReplyDelete
    Replies
    1. Censor has some harsh connotations but there are certainly two reasons that people "censor" data. One they shouldn't do and one they should. If they omit data (or skew the axes or any of the other tricks) to cover up the story the data tells that's a no no. But often times people get so used to reusing old charts or including five years worth of data because that's what we always do that they obscure the story. In those cases they should omit the data that doesn't support the story. A picture (or chart) is worth a thousand words and people should be purposeful and pick the right type of chart and right amount of data to tell the thousand words.

      Delete
  5. What are your thoughts on this?
    http://s12.postimage.org/u0i0ii8vh/Picture1.jpg

    ReplyDelete
    Replies
    1. Victor,
      A number of thoughts:
      - What is the sort order of the lines? There should be a meaningful sort.
      - Why are there two "Cat 1" labels? Seems like this image was manually made.
      - Coloring the lines all different colors does not add to the analysis.
      - it is not clear what vertex the text labels are for. For example the label for 78%. Having the label on top of the line is not good either, eg 61%.
      - Why is there no vertical axis? Adding an axis would enable reading the values of the vertexes not labeled.
      - It would be helpful to add a visual clue to the Year axis to enabling seeing where the axis restarts, something like a header dividing line. Even breaking the long horizontal line you have at the bottom would be a good visual clue that the axis is restarting.
      - Why not use a four digit year? I have to mentally convert a two digit year to a four digit year to read the axis.
      - There is no chart title or any description of what this data is.

      see http://postimage.org/image/fidqm143j/ for an alternative route.

      Delete
  6. See the Bissantz site for ideas on tiling line plots and other alternatives to stacked bar charts and spaghetti charts (their term, not mine).

    http://www.bella-consults.com/superimposed-time-series
    http://www.bella-consults.com/page/4

    Thanks,
    Joe

    ReplyDelete