Tuesday, May 14, 2013

plotting a value within a range

I often refer to my method of creating graphs as "brute force Excel," meaning that you can make almost anything work in Excel, but it sometimes means getting a little creative. The method described in this post can be considered a prime example of brute force Excel.

This example was taken from a recent workshop, where I've anonymized the details. The goal was to understand the cost of various high cost items in a given region relative to some comparison regions. The hypothesis was that the region of interest (Region 1, in the example below) had higher cost items than the other regions. The original visual was a table that looked something like this:


The table in this case might be helpful for the exploratory analysis, but it isn't well-suited for explanatory analysis, because there is simply too much to try to take in to form a conclusion. So I knew going in that I wanted to make the data more visual. The question was: how?

While I was considering this, I sketched out an idea:


Rather than show the individual values for each of the regions, my idea was to plot the range of prices across the various regions, from the minimum value to the maximum value (irrespective of which region has the min or max price, since it didn't seem like that was critical for what we are trying to show here) and then show the specific value for Region 1, the region of interest, within that range.

The next challenge was how to create this visual in Excel. I'll take you through how I did this. The brute force in this case is a couple of stacked bar charts, with some of the series made to be invisible. Here's the step-by-step:

STEP 1: Plot the minimum price and range (the difference between the maximum value and the minimum value across the various regions) as a stacked bar chart on the primary axis.

STEP 2: Make the blue series (Min) invisible by formatting it so there is no fill color, no line, and no shadow (I will never understand why Excel adds a shadow in the first place!).

STEP 3: Plot the Region 1 price and Marker (for this, I just put a value of 100 across all of the items - just big enough for it to show up on the graph - this will be what we'll use to show where Region 1 falls within the range) on the secondary axis. At this step, you must set the secondary axis maximum to $16,000 (overwriting the default of $10,000) so that it lines up with the primary x-axis.

STEP 4: Add data labels to the green series (Region 1) and format them so they are placed at "inside end" of the bar. Then make the green series invisible (no fill, no line, no shadow).

STEP 5: Format, format, format! The base visual looks like what we want. Now it's about playing with the details of the visual so that the information is as straightforward to consume as possible.

In this step, I think about getting rid of clutter, using color strategically to draw my audience's eye to where I want them to pay attention, and adding titles, footnotes, etc. to describe and explain my visual. A couple of notes on changes I made in this step:

  • The item names and horizontal borders are in the cells directly (vs. part of the graph); I find it easier to format the visual this way, you just have to be sure that you've sized the graph so it lines up to the labels. 
  • I reversed the order of the y-axis so the items range from min to max (vs. max to min as shown in the above steps) to leverage how our eyes tend to take in information (from left to right).


The above visual still takes a little processing to figure out what's going on. But I find the data much faster to process here than it was in the original table. Now, we can see relatively quickly that the price in Region 1 is mostly on the lower end of the ranges when compared to other regions (with a couple exceptions).

The final step would be to put the story around this visual: what it shows, why that's interesting, and what action should be taken (hard to do with the generalized example, but there was an interesting story here in the original version).

The Excel workbook with the progression outlined above can be downloaded here.

11 comments:

  1. Nice work. I'd make a few changes. I find that the data labels interfere with the pattern of the data. If the numbers are needed, I'd rather see them at the left or right of the chart lined up so that comparisons are easier to make. The next is very minor but I'd rather a font that takes the same space for all letters so that Item I isn't indented so much.

    ReplyDelete
  2. Hi Cole,

    I like your chart design. Using invisible stacked charts in excel can be very effective.

    Quick question - what do you think about using parallel coordinates for this example instead? Something like this?

    http://flic.kr/p/ejesmz

    It has the advantage of allowing the viewer to see the prices for each item as well as the range, but perhaps at the cost of not being as immediately accessible.

    ReplyDelete
  3. Naomi, thanks for your feedback.

    John, it's an interesting idea. I don't love lines for non-continuous data, however, as I think it implies a connection between the categories that doesn't necessarily make sense.

    Thanks both for your comments!

    ReplyDelete
  4. Hi Cole,

    This is a great explanation! I recently did a similar exercise of plotting price distribution and came up with a similar chart. However, the execs wanted more, as they normally do. They wanted to see more detail on the distribution of the range.

    My first attempt was to create a quartile plot, which I saw in Tufte's book. It's another brute force creation in Excel, and I ended up spending more time explaining what the quartile plot represented. Although simple in design, I think they are bit too complicated for the average reader to easily comprehend.

    I then used an XY Scatter chart to basically create the same visualization you have, but with all the data points plotted in the range. This allows you see how Region 1 compares to the distribution of all the other regions. Since there could be a few outliers that skew the distribution.

    I created a sample using your data. I included Naomi's suggestion of putting the data labels on the left side. You can see a screenshot here.

    http://goo.gl/UfLaJ

    And download the file here.

    http://goo.gl/i9M8P

    All the data is self contained on one chart, making it easy to copy to a powerpoint. Let me know if you have questions.

    ReplyDelete
  5. Thanks for posting Cole, I like the clever use of stacked bars and secondary axes. I was also going to comment on the lack of insight into distribution, and enjoyed Jon Acampora's modification.

    If the primary goal is to show comparisons, might some directly comparative stats be helpful too? It's still not immediately clear how this region stacks up, so I was thinking ranks (or % ranks) could help boil down the relationship to a single number per item, or even a single number across all items. For example, their % rank of the average item cost is 25%, average % rank across all items is 38%, and are in the bottom half (<50% rank) in 59% of items. Calculations are in column H of http://goo.gl/FnPqv

    ReplyDelete
    Replies
    1. Hi Max,

      I like the idea of displaying overall rank. I used your idea to create a similar approach by displaying the rank number 1 to 5, and added this to the chart. Also added a small table that displays a list of regions by average rank. This clearly shows that Region 1 has lower prices than the others. I'm sure the visualization of it could use a few iterations... :)

      screenshot - http://goo.gl/LuYmx
      file - http://goo.gl/SPDE0

      I'm a new reader to this blog but it's cool to see such an active group of readers. I'm definitely learning a lot. Thanks!

      Delete
  6. I really liked this one when you presented it. Thanks for explaining the details in creating it!

    ReplyDelete
  7. 2 things I like in this post is : 1. Sketching and designing a good graphic to convey the message for this situation. 2. Effectively using the capabilities of excel to create a graphic that was designed.

    Normally I find people miss out on any one of these or both. Beautiful post

    ReplyDelete
  8. Wow really happy to have found your blog, I am a heavy excel user but have been also been using Tableau and wonder how you could do something similar, and throw in the ability to select the region shown. I am am going to have a crack and let you know what I get.

    ReplyDelete
  9. Hi Cole,

    Really enjoyed your presentation tonight! I was just browsing your blog and loved this post in particular. However, I'm having some trouble replicating your steps. Can you explain exactly how we should go about adding the Region 1 price and Marker in step 3? I'm having trouble adding that series in excel, but this process is something I look forward to using the future!

    Thanks!
    Sarah

    ReplyDelete
    Replies
    1. Hi Sarah,

      I'm glad you enjoyed the talk! In Step 3, I created a data series for the "Marker" (and gave it value of 100 across the board - just enough so it will show up on the graph given the size of the other numbers) and plotted it on the secondary axis.

      It's probably clearest if you download the excel file and then scroll to the step of interest and click on the "Marker" series so you can see what data it's pointed at and poke around in the chart to see what's going on.

      I hope this helps!

      Delete