Wednesday, December 12, 2012

the functional art

It's a little beat up and has scraps of paper sticking out in various directions marking pages I want to refer back to: it's my copy of the functional art by Alberto Cairo, and it's been everywhere with me over the past month - from LA to DC to Milwaukee, and several places in between (yes, that probably means I'm a slow reader). I finished it on my latest plane ride home. In a word: it was awesome.

As the subtitle declares, the focus of the book is information graphics and visualization. Alberto has a conversational, super accessible writing style. His research is augmented with his extensive experience in data journalism and the book is filled with illustrative stories and examples. the functional art begins with a section on Foundations - what visualization is and does, the importance of building a narrative structure, and introduces a visualization wheel to evaluate competing priorities. It then moves into Cognition - discussion of the eye, the brain, and how people see. The third section focuses on Practice - the infographic creation process and interactive graphics. While I enjoyed the entire book, the final section, Profiles, was my personal favorite - it recaps Alberto's interviews with various practitioners working with infographics. I also enjoyed examining the various examples throughout the book and seeing the progression from sketches to final product.

The book was inspiring, as is Alberto's work in general. I had the pleasure of speaking with him a couple of months ago, when he was getting ready to launch the first massive online course on an Introduction to Infographics and Data Visualization through the Knight Center. This 6 week course filled up quickly, so a second was recently announced that begins in January (details here). Alberto also teaches at the University of Miami and blogs at www.thefunctionalart.com. His passion in this space is clear and from what I've seen, permeates through all that he does.

For those interested in infographics or data visualization in general, I highly recommend the functional art; you can purchase a copy of it here.

Thursday, December 6, 2012

I like [candy] bars better than donuts

I've recently hosted a couple data visualization challenges (here and here), but it's been a while since I've made a contribution to one. That is about to change. Recently, Naomi Robbins announced a makeover challenge in her Forbes blog (details here - open until 12/9 if you want to participate). The challenge included two visuals. My discussion and makeover of each follows.

Makeover #1: I like [candy] bars better than donuts...
The first visual included two donut graphs (the pie chart's even less effective cousin; see here for a related post on donut charts):


It took staring at this for awhile for me to figure out what was going on. Based on the title, it seems we're meant to compare the segments of the donuts across the two charts. This is not an easy comparison - in addition to having to measure angles and areas (something our eyes aren't very good at), the slices are also in different places due to the difference in magnitudes within the donuts, making the comparison even more challenging. 

I also question the takeaway called out - that LinkedIn referral traffic is 16x higher for B2B companies. The proportion it makes up of total is 16x higher, but that's not necessarily the same thing, since total referral volume for B2B vs. B2B could be very different. Unfortunately, we aren't given the context of total referral volume here. I also hesitate to call out an increase like this, since the initial comparison point is 1% - a small number, making the 16x increase not necessarily as impressive as it at first sounds.

To Gavin's dismay, I'm going to go with a bar chart here (but bear with me, no bar charts for the next makeover, I promise!). Yes, I use bar charts frequently. It's because they are so easy to read! Here's where I landed:


In this case, we can easily compare the relative breakdown of referrals by source for B2C vs. B2B. For me, Facebook's dominance across both comes across more clearly here than it did with the pies. (I'm still questioning the use of %s here vs. absolute numbers, but given that's the data we had to work with, I'll let that concern go for now).

Makeover #2: Cole's first slopegraph
The second visual set forth in the contest is a table, accompanied by the challenge to "suggest a graphical representation" of the information displayed. The table shows expenditures for homeowners vs. renters in 1986 vs. 2010 across a number of categories:

In a recent workshop I facilitated, one of the participants asked about using line graphs when you only have two time periods and whether this is advisable. I think if I'd been asked a couple of weeks ago, I would have probably responded no. But I recently finished reading Alberto Cairo's The Functional Art (my book review post is forthcoming), where he features a couple examples of this done really well. I drew an example on the board in the workshop to illustrate when this can work. Here's my first go at creating one for real:

For me, the interesting story here is around which categories increased (depicted in blue above) and which decreased (grey). I think the slopegraph depicts this well. To reduce the business of the visual, I chose not to show all of the categories depicted in the original table. I omitted Entertainment and Other (since they both remained relatively flat) and included a note on this in the footnote. I also chose not to show the subcomponent pieces of the categories included in the table, but rather included some comments on what I found to be the interesting observations from those on the visual directly.

I'd love your feedback on the above visuals. I think in both cases, the story comes across much more clearly in the madeover visuals compared to the originals. If you have other ideas on how to visualize this data, leave a comment with your thoughts (or even better, submit an entry to the challenge!).

Saturday, December 1, 2012

and the winner is...

Thanks to those who submitted visuals in response to the recent data viz challenge (and thanks for your patience in waiting for this recap!). The goal was to turn a selection of the stats included in a recent Pew Research Center article on how teens research into visual form. There were four entries, which I'll recap here, in addition to announcing the highly anticipated winner!


Submission 1: Joe Mako
Joe took the item "Research tools teachers say their students are most likely to use..." and created a Tableau visual, commenting "I like diverging stacked bar charts for plotting the results of Likert and other rating scales. They enable you to simply see the overall direction and the detail at the same time."

I really like this visual: it's well labeled and organized, with thoughtful attention to design. For example, very light grey shading helps your eye read across horizontally from the label to the data it describes without being obtrusive, sorting in increasing order of unfavorable/decreasing order of favorable provides a nice construct within which to interpret the data, and your attention is drawn clearly due to strength of color to the tails - not at all likely and very likely - the most interesting pieces of data. Subtly labeling the data bars allows for easy comparison across the different research tools (which would otherwise be difficult for the tails due to the lack of consistent baseline).

In terms of feedback, there are three (relatively minor) things I would potentially change in this visual:
  1. Swap the formatting on the title and the takeaway to emphasize the main takeaway, that the internet is much more popular than resources at the library.
  2. Move the legend to the top of the graph, so the audience encounters what the bars mean before they get to the bars themselves.
  3. Correct typo: I don't think I would have caught this, but reader Rupert Stechmann did - there's a typo in the takeaway at the top where 'is' is repeated twice. Attention to detail is critical, and perceived lack of it can call the entire analytical process used into question.
The direct link to Joe's image can be found here.


Submission 2: Sam Feldman
Sam created his data viz on how students use their cell phones in class:

I give Sam high points for creativity - using color within the cell phone image to demonstrate the percentages. But it took me a little time looking at the visual to realize that's what's happening. In this case, I'd recommend a few modifications to make this visualization higher impact:
  1. Reverse the sort ordering so that you start with the biggest segment at the top, and work downward to the smaller segments. Then your takeaway could become: the majority of students use cell phones in class to look up information or take a photo/video for an assignment.
  2. Make the data in the cell phone stand out more: take away the sort of marbled shading within the different colors (it's distracting and doesn't add anything) and play with making the lines on the cell phone more subtle (try making them white or grey) - basically, I want the colored segments to stand out much more than the cell phone itself, but still preserve the ability to recognize it as a cell phone.
  3. Right justify the labels so each label is directly next to the portion of the cell phone it describes. Omit the boxes that tie the label to the given portion of the bar; instead make the label itself the same color as the bar it describes. There's plenty of space, so I'd consider making the labels larger as well.

Submission 3: Hrvoje Smolic
Hrvoje picked a pie chart visualization from the article and turned it into a slopegraph:


Hrvoje's blog post on this makeover can be found here. I agree that the relative increases and decreases are much clearer here than in the original pies. What I crave in this case are more words to make what we're looking at clear: let's state the main takeaway (what's interesting or noteworthy), add a graph title, a more descriptive y-axis label, and the data source. Now that we can see the data in a more straightforward fashion, let's think about what story we want to tell and draw our audience's attention to the relevant parts.


Submission 4: Jane Pong
Jane visualized how different groups of teachers perceive the impact that school policies have on their teaching. She converted a table included in the article into a more visual form (full size available here):

Jane's approach is similar to Joe's, plotting the Likert scale in horizontal bars anchored at zero. I think this is a good approach, but a little more labeling would help the audience to more quickly interpret what they're looking at here. It took me a bit of time to understand that the item text is what's shown at the left, and the breakdown at the right is income (mostly below poverty level vs. mostly middle/upper class) and size of area (large metro vs. small town). I'm not sure what the different colors represent, so we should make that clear.

This visual shows a lot of different comparisons - this is one of those cases where clearly identifying a single story or two that we want to focus on could be helpful for determining what data to show (perhaps we don't need all of this) and making it easier for our audience to consume. 


...and the winner is...
While each entry has it's merits, the winner of this data viz challenge in my opinion is Joe's diverging stacked bar chart. It clearly tells a story, both through words and through the accompanying visual, which is utopia for me when it comes to storytelling with data. Joe, I'll be reaching out with the promised offer to pen a guest blog post.

A great big thanks to everyone who submitted entries. I really appreciate the time you took and the work you shared!

Sunday, November 25, 2012

celebrating (almost) 100 posts with 10 tips

As I was looking at the underbelly of my blog the other day (the side only I see that has info on posts, pageviews, etc.), a number caught my eye:

99

This number described the number of posts I have published on this blog. Which meant that my next post (the one you're reading currently) would be my 100th. This seems like a significant number.* 
*Upon closer evaluation, 99 actually describes total posts... published + drafts. Removing my drafts makes this the 91st published post. Since 100 was a somewhat arbitrary number anyway, I decided to go ahead with this post now instead of wait for the actual 100th post. I guess we can consider this a celebration of 91 posts published!

It's amazing to me how the time since I began writing this blog has gone by...I've been sharing my thoughts on the same topic for a time quite suddenly better measured in years than months, with my interest in learning and teaching and writing about communicating effectively with data continuing to grow.

I thought I'd use this (almost) 100th post as an excuse to look back at storytellingwithdata posts over the past two years and handpick my top 10 tips for telling a visual story with data. Here they are, in rough order of my general approach to the visualization process (click the link for the full relevant post):

cole's 10 tips for effective storytelling with data
  1. Set aside time for the visualization process.
  2. Start with a blank piece of paper.
  3. Keep your audience top of mind.
  4. Generally avoid pie charts.
  5. Always label your axes.
  6. Leverage preattentive attributes.
  7. Declutter your visuals.
  8. Consider cutting gridlines.
  9. Employ visual editing.
  10. Use words to make your visual accessible.

Thank you very much for reading and I hope you'll join me for the next 100+ posts!

Sunday, November 11, 2012

data viz challenge... how teens research

I subscribe to updates from the Pew Research Center. It's a great way to ensure a consistent inflow of data, which is useful as I gather and examine examples for workshops and my blog. Often, the incoming email gets quickly scanned and archived. But last week one of the titles piqued my interest, so I clicked to learn more about How Teens Research in the Digital Age.

When I think back to my own research projects in high school, I have images of trekking to the county library and using large computer terminals to locate old news articles on microfiche. Or making the even longer journey into the city to utilize the massive book collection at the university library. (I grew up in the sticks and when I was younger, these truly did seem like Illiad-like voyages...adventures to the city even involved a ferry boat!).

As I suspected, the means for researching for teens today are very different. They don't even have to leave their house (or their room!) if they don't want to, since the internet, and thus the world's information, is at their fingertips.

Alright, that's sufficient prelude. You're probably wondering about this data challenge that I mentioned in the title. On to that. I was surprised reading this article how many stats were included and yet how few visuals. Only one graph, in fact:


My challenge to you is this: read the article and determine what data you find most interesting then visualize it. You can remake the above graph, or focus on bringing life to numbers included in the report by making them visual. Submit your entries via the following instructions by Sunday, November 25th (those in the states need something to do over Thanksgiving break, right?).

When complete, you can leave a comment with the story you would focus on and a link to your visual, or email it to me directly (cole.nussbaumer@gmail.com) along with any comments you'd like me to post with it, and I'll put it into Dropbox and create a comment for you with the link (if you don't already have a Dropbox account, this is a good reason to get one!).

I'll invite the creator of my favorite to write a guest blog post. Happy data visualizing!

Friday, November 2, 2012

to stack or not to stack

My husband came across the graphical focus of this post in his Google+ stream last week. The original source is a Wall Street Journal blog post summarizing a recent Forrester report, where the main story can be summed up by one of the Forrester quotes within the post, "The future is one where no single OS or vendor is dominant - Microsoft is extremely late to the market expansion into mobile and has lost its dominant position."

Here is the graph included to illustrate this point:


It's true that you can get the evidence to support the claim made from this graph: once you identify the light blue portion of the bar as Microsoft, we see clearly that it decreases over time as the orange portion (Google) and the yellow portion (Apple) become increasingly prominent. But I'm of course not satisfied with it. The color palette is strange. Color in general could be used more strategically here. We can eliminate the work of going back and forth between the legend and the data it describes. I'm also not sure how I feel about the stacked bar chart.

Let's look at a couple variations on this data viz. First, here's what it could look like if we preserve the stacked bar and use color a little differently (note: I didn't have the raw data, so the remakes below are based on me eyeballing the figures and likely aren't entirely accurate):

Other minor changes I made above:
  • Added an action title so it's clear what to look for in the graph (this was included as tiny text below the graph in the original post).
  • Oriented the graph title and legend text at upper left - so reader encounters how to read the data before they get to it.
  • Added a title to the y-axis. Always include this!
  • Added data labels to the Microsoft series. This acts to both draw more attention to Microsoft, as well as to give a quick numerical view of the decrease over time.
  • Narrowed the bars. In the original, they are bordering on too thick so that our eye starts to try to compare the area rather than the height.
When it came to color, I took a look at Microsoft's logo. I nearly always use blue to highlight the areas to which I want to draw attention. In this case, I actually tried venturing out using the red color from the logo and then the green color from the logo. But both just looked a little off (the burnt red looked overly negative to me, the green a little pukey). So I went with the blue from the logo (matching by eye - it's not perfect but close). I chose a grey palette for the remaining series.

I still don't love this, for a couple of reasons.

First, I think I just don't like stacked bar charts. This is actually probably a good use case for this graph type, since this lets us emphasize the percent of total and how that's changing over time. But I still don't love it. Because the bars aren't oriented on a consistent baseline, our eyes are forced to compare differing heights starting from different points. That's fine to get a general view, which is probably all we need here. Perhaps I'm just being overly finicky.

Second, it's still a bit of work to go back and forth between the legend and the data. If you don't recognize that the legend (left to right) is in the same order as the right-hand bar (top to bottom), it could prove difficult to see quickly which series is which.

I thought I would like a line graph of this data better - it would allow me to organize the series on a consistent baseline as well as label each directly. But then I graphed it and reconsidered:


Personally, my issues with this line graph version are greater than my issues with the stacked bar. We lose clear visibility/confidence that the lines sum to the total market, 100%. I also worry that in this case, the lines make it at first glance appear that we have more data than we do. Perhaps three points per series is too few for a line graph? If not in general, than I think that is at least true in this case. The overlapping nature of the lines creates a sort of spaghetti graph (as if I had a handful of uncooked spaghetti noodles and threw them on the ground). I tried to make this better by emphasizing the main three series (Google, Microsoft, and Apple) and de-emphasizing the others. But it still isn't great.

Given this, I'm back to the stacked bar chart. I think the information comes across most quickly with that structure.

What do you think - is a stacked bar the best choice here? Or are there additional options worth considering?

In case you're interested, here is the Excel file with the makeovers.

Wednesday, October 24, 2012

start with a blank piece of paper

I think this might be one of my best pieces of advice when it comes to communicating effectively with data: start with a blank piece of paper.

I've never encountered a child who didn't enjoy a blank piece of paper and what they can do to it with the simplest of materials: a crayon, a pencil, a pen. For some reason, something seems to happen to most of us over time where we stop drawing. Even if you don't consider yourself to be particularly talented in this space, I want to implore you to channel your inner child. Don't start with your graphing application or presentation software; rather, start with a blank piece of paper and a pencil (or a blank whiteboard can also work well).

Here are a couple of thoughts on why:

  • Iterating is easier: There's something about creating a graph in your graphing application or a slide in your presentation software that causes you to develop attachment towards it. Sometimes, even if we know something isn't quite right, if we've already built the graph or slide, we feel hesitancy towards changing it. This doesn't happen with paper. When I've drawn something, if it isn't quite right, I can quickly recycle the paper and start with a fresh one. Or erase my whiteboard and start fresh. 
  • Clutter doesn't enter the picture: This struck me as I was preparing for a workshop last week where we did an exercise starting with blank paper: when you draw a graph, you don't introduce the sort of visual clutter that many graphing applications do. Starting with a blank page lets you focus on the data, what you want to show, and what you want to emphasize. Get your visual looking right on paper, then use that as your guide when you create the real visual.

In what situations do you find yourself trading in your computer for pencil and paper? What benefits does that approach have beyond what I've outlined here?

Sunday, October 14, 2012

my penchant for horizontal bar graphs

I have a penchant for horizontal bar graphs.

First and foremost, this fondness comes because they are so easy to read. When taking in a bar chart, your eyes compare the endpoints. The placement of the bars on a common baseline (whether horizontal or vertical) makes it easy to see quickly which category is the largest, which is the smallest, as well as the incremental differences between categories.

One use case for horizontal bar charts, in particular, is when your category names are long: the orientation of the chart allows the text to be placed from left to right - as most people read - thus improving the legibility of the information you are providing (vs. being forced to make your category names diagonal or vertical - both of which I recommend avoiding due to difficulty reading).

Also of note is that the orientation of the bars can change the way the information is taken in simply by virtue of how it is organized. In some cases, this may make a horizontal bar chart preferable to the standard vertical bar chart.

Let's consider at an example. Look at the following chart, taking note of the process you use to take the information in:
When I process it, my eyes first read the overall title, then move leftward to the top of the y-axis (which would be the perfect place for an axis title, so I know what I'm looking at). Next, I scan across the tops of the bars, following a sort of Z formation, only at the end do my eyes encounter the category names, which I have to read and then put back into the context of the data above them to attempt to draw insight from the information that's being presented.

This isn't the end of the world. But it takes time. Time that I think we can make better use of.

Let's look at how we take in this information when it's presented in a slightly different manner. In the remake below, I've made a number of changes:

  • Replaced the descriptive title with an active one, using that precious real estate (the first thing our audience encounters!) to outline a takeaway and set the stage for what will come next.
  • Added insights with text (using the preattentive attributes of bold & color to draw your attention to the important part), placing this at the top for additional context on what to look for in the data before you get to the actual data.
  • Flipped the chart on it's side, making a horizontal bar chart and orienting the category labels from left to right. Easier to read, right?
  • Ordered the data from greatest to least. This creates a visual construct, making the data easier to consume. (Note: if there is an intrinsic order to your categories that's important, you would leverage that, but in absence of that, sort either ascending or descending by value).
  • Labeled the x-axis (what was the y-axis in the original version) to make it unquestionable what you're looking at. I got rid of the actual axis, instead labeling the bars with the numerical values they represent directly.
  • Eliminated unnecessary clutter: background shading and gridlines be gone (there was also clutter in the previous y-axis labels with the trailing zeroes, but that had already been taken care of by eliminating the axis).
  • Reduced the width (or height, now that we've flipped it on its side) of the bars. My recommendation is to generally have the bars take up more space than the white space in between them, but you don't want them so wide that our eyes start to try to evaluate area (in error) vs. height. The first chart was bordering on too wide per this last point.
  • Used color strategically within the graph to tie the insight at the top to the evidence in the chart that supports it.

Here's what we end up with after these changes have been incorporated. Again, take note of the order in which you process the information: 

For me, as I scan, my eye pauses on the title, the blue text, then the graph title, and finally to the category titles at the left and corresponding data at the right. In this case, by the time I get to the data, I already know what I'm looking at (due to labels) and what I'm looking for (due to explanatory text above the graph).

Because we know how to read bar charts, the first instance above probably isn't something you would consider difficult to read. But note how a few changes make the information so much more accessible.

This isn't to say that horizontal bar graphs will always be preferable to their vertical counterparts, but rather to highlight some things to think about as you are choosing between the two. When in doubt, plot your data both ways and compare side by side to judge which will be the easiest for your audience to consume.

Friday, October 5, 2012

ready, set, critique!

I just came across the following daily chart from The Economist on my Google+ stream:

See full article.
My questions to you: What story does it tell? What story could it tell? How would you change the way the information is presented to do that in an effective way?

Ready...set...critique! Leave a comment with your thoughts.

Sunday, September 30, 2012

your input please: font

We have a small debate underway in my day job regarding font. Specifically, which should be our default or standard font for analyses, presentations, etc. This led me to the question: when it comes to font choices, where do best practices end and personal preferences begin?

I'm aware of some relevant research conducted by psychologists Song and Schwarz in 2008 at the University of MI at Ann Arbor, where they showed college students recipes for sushi and asked them to estimate 1) how long the recipe would take them to make and 2) how inclined they were to do so. The only thing that varied between the recipes was the font in which it was written. What they found in a nutshell was that the fussier the font, the more difficult the students judged the recipe and the less likely they were to want to attempt making it. For me, the translation for data visualization broadly is that the more complicated it looks, the less likely your audience is to take time with it.

But back to my specific question: if both fonts are straightforward to read (no legibility issues), how do you choose?

To try to answer this question, I initially planned on doing some research; I quickly grew impatient with this. My brief attempt in Google searches taught me that there is no shortage of font fodder on the internet. There are conflicting lists of the "best" fonts (example). Others have done much more research in this area than I care to (example). I was struck that there don't even seem to be consistent opinions on questions I thought would be easy (e.g. serif vs. sans serif... sans, obviously, right? not according to Wikipedia).

So rather than continue down this slightly frustrating path, I thought I'd pose the problem to you to see if any consensus in the form of the wisdom of crowds emerges. Here are the fonts we're considering:


The quick brown fox jumps over the lazy dog
1234567890 (Calibri)

The quick brown fox jumps over the lazy dog

1234567890 (Open Sans)

The quick brown fox jumps over the lazy dog
1234567890 (Arial)


Specifically, when it comes to the open debate at work: my colleague and I are in agreement that Calibri should not be our default font. I think our reasoning when you boil it down is probably simply because we don't like it vs. anything scientific. Where we differ is on the question of Open Sans vs. Arial. I won't bias you by revealing which I prefer (though my sans serif comment and the text on this blog serve as a pretty big hint).

My questions to you are: If you were weighing in on this decision, what factors would you consider? Which font do you prefer? Why? Leave a comment with your thoughts!

Thursday, September 27, 2012

quick tip: left uppermost align title text

I've commented in the past about the important role that text plays in data visualization: in short, it helps to make the information you provide more accessible to your audience. But where should you place your text for it to best play its role? When it comes to chart and axis titles and legend, my recommendation is to left uppermost justify.

I frequently see chart and axis titles center-aligned and the legend placed to the right of the data it describes. Many standard tools default to this. I favor left uppermost justifying over center-title-alignment and righthand-legend-placement due to two reasons:
  1. Center alignment looks messy: center alignment doesn't create a clean line on either the left or the right, so text is left visually hanging.
  2. Your eye hits the left uppermost space first: in Western cultures, most people read left to right, top to bottom*. This means if you left uppermost justify your graph title, legend, and axis titles, your audience's eye hits how to interpret what they're looking at before they get to the data. 
*I'm frequently asked the question how this changes in cultures reading in other directions: the small sample I've posed this question to have said that when it comes to business, the Western style prevails since so much international business is conducted in English. Please leave a comment with your thoughts if you have insight on this!

What I mean when I say "left uppermost align" when it comes to graph titles and legend is:
  • Graph title (+subtitle, if applicable) are positioned above graph and left-aligned.
  • Legend is placed above graph (below title/subtitle) and left-aligned.
  • y-axis title is aligned with topmost y-axis label.
  • x-axis title is aligned with leftmost x-axis label.

Here's a quick look at what a typical graph looks like with default text alignment settings compared to when we follow this tip:


Personally, I steer clear of center alignment almost always in favor of left- or right-alignment. Outside of titles and legends, whether to left- or right-align your text comes down to the layout of the visual: sometimes right-alignment makes sense, for example in a horizontal bar chart you should right-align your y-axis labels so that funny spacing isn't created between the labels and the data. When in doubt, try aligning a couple of different ways and see what looks best: trust your eye or solicit input from a colleague.

Note: the Excel template to create the left uppermost chart above can be downloaded here.

Thursday, September 20, 2012

bar charts must have a zero baseline

This is one rule of data visualization that I see broken too often: when it comes to bar charts, the y-axis must begin at zero.

When our eyes interpret bar charts, we are comparing the relative heights of the bars. When we cut the height off at something greater than zero, it skews this visual comparison, over-emphasizing the difference between the bars in a way that simply isn't honest. Most recently, I saw this in a visual that was forwarded by a friend of a colleague. The offender: Fox News.


There are a number of things that bother me about this visual. Beyond the unnecessary visual clutter of tiny gridlines and strange chart borders, the y-axis isn't labeled (I think it's Top Tax Rate, as noted by the subtitle, but this would be a lot clearer if the axis itself were labeled) and it is placed on the right-hand size of the visual, so it's the last thing I see as my eyes scan across from left to right, making it even less likely that I see the biggest issue with the graphic, the fact that the y-axis starts at 34%. This makes the difference between Now (35%) and Jan 1, 2013 (39.6%) appear to be way bigger than it actually is.

How big of an issue is this? Let's do some math to find out. The way it's graphed, the height of the bars are 1 (35-34) and 5.6 (39.6-34). This represents a visual increase of 460% ((5.6-1)/1). If we graph the bars with a zero baseline so that the heights are accurately represented - 35 and 39.6 we get a visual (and actual) increase of 13% ((39.6-35)/35). Perhaps that is still significant and that is the point that Fox News was attempting to make. That's fine, but I wish they would have done it without this visual misrepresentation of the truth.

A couple related things to consider (and I have my own opinion on each of these that I'll of course make clear):
  • I've heard the argument that if you're graphing something that has a sort of "natural" baseline of something greater than zero, then it might be appropriate to start with that. For example, if we consider the baseline unemployment rate to be 5%, then the argument goes that you could use this 5% as the baseline. I don't like it. For me, it isn't a valid visual comparison, so if that were the case, I'd use a different way to show it (perhaps plot the entirety of the bars but then also highlight 5% horizontal line and label it in a way that makes it clear how to use it for comparison).
  • When it comes to line graphs, the zero baseline rule does not hold. In other words, you can get away with a non-zero baseline in a line graph. With line graphs, we compare the lines to each other more than their height from the x-axis. Still, you need to be careful. I would advise to make it clear to your audience that you're using a non-zero baseline so they interpret the information correctly (one approach: label the y-axis and highlight the minimum value in bold so attention is drawn that it's something other than zero). And you need to be careful about zooming in too much and making a change that is minor look big - this gets you back into the visual misrepresentation place that we want to avoid.
My advice to Fox News (and to those communicating with data in general) would be to first determine the story you want to tell. Then determine what data will best support this story. Don't compel your audience with visual misrepresentations; rather, convince them with accurately displayed data that backs up the point you are trying to make.

Related note: there are a number of posts by others on this and related topics. In case you're interested in reading more, here are a few I'm aware of (not an exhaustive list):

Thursday, September 13, 2012

some finer points of data visualization

Last month, I conducted the first storytelling with data Data Viz Challenge. In addition to eternal notoriety, I promised the winner the invitation to write a guest blog post (in case you're interested, a full rundown of the entries and my comments about each can be viewed here). Winner Jeff Shaffer came through with the following post, which I'm excited to share with you here.
_________________________________

I have enjoyed reading Cole's blog at storytellingwithdata.com, so when she invited me to write a guest post I was thrilled with the opportunity. The challenge became focusing in on the exact topic for my post. Cole has done some terrific redesigns over the years, turning some not-so-good charts into good data visualizations. It would have been easy to find another bad chart and post a redesign because let's face it, there are more bad examples out there than good ones. So for this post I decided to cover some of the finer points of design in data visualization.

Before I do a critique of a chart, I wanted to share my view on creating a good data visualization. I teach data visualization at the University of Cincinnati and as part of the course I cover what I call "The Shaffer 4 C's of Data Visualization". They simply serve as a guideline to follow when creating or critiquing a data visualization.

The Shaffer 4 C's of Data Visualization:

  1. Clear - easily seen; sharply defined. Who's the audience? What's the message? Clarity is more important than aesthetics Ex. good chart title, critical labels, units of measure, avoiding rotated text, good color choice, etc.
  2. Clean - thorough; complete; unadulterated. Ex. not overlabeling axis and data points, too many gridlines or too dark, proper formatting, using the right chart type, poor color choice, etc.
  3. Concise - brief but comprehensive
    Not minimalist but not verbose
  4. Captivating - to attract and hold by beauty or excellenceDoes it capture attention? Is it interesting? Does it tell the story?

It's important to understand that certain elements can affect more than one area. For example, if there is a poor chart type or a 3D graphic used it could violate both the Clear and Clean principle and if the chart is loaded with data labels at every opportunity then it could easily violate both Clean and Concise. On the other hand it's quite possible to create a very Clean chart following all of the appropriate data visualization rules, but the message is lost (not Clear) or it may not be a story worth telling (not Captivating).

Color is another example that could affect multiple things. For example, using red/green would not be Clear to someone who is colorblind or using a categorical color scheme instead of a sequential color scheme for a certain data type might be very confusing. Alerting colors might confuse the message drawing attention to something it shouldn't. However, over use of color, gradient or shadow could also affect Clean. Even if the message is Clear, it might still be a sloppy looking chart with poor color choices. For example, bright pink mixed with red might cause a visceral reaction to the clash.

One final comment on the 4 C's of Data Visualization. I specifically used Concise to contrast what I believe to be a minimalist approach to data visualization by Edward Tufte and some others in the field. It isn't necessary in my view to save ink as if my printer cartridge were running dry. I also believe it's ok at times to have extra emphasis, even if it's redundant and I think the use of color can be used to help with Captivating so that the visualization isn't boring. What would the world be like if every chart were black and white, shades of gray, or blue and orange? Don't get me wrong, I have nothing against any of these and the blue/orange colorblind-friendly palette is one of my favorites, but we can't use it for everything.

On the flip side, there is a fine line between adding color for this purpose and that color becoming distracting, alerting or overpowering the reader. Jeffrey Heer, Associate Professor at the University of Washington and formerly with the Stanford Visualization Group, co-authored a paper with Wesley Willet and Maneesh Agrawala discussing Scented Widgets. "Visual Scent" was used to describe navigation cues embedded in visualizations. It's a great paper and I think the term visual scent will be used more, but I will add to the lexicon my coining my own term, "Visual Order". It's far too easy to create a chart in Excel that looks like Pac Man eating a skittles rainbow (yes, this is a real chart that someone produced with the simple addition of the eye added for effect). I won't critique this chart today.
Below is a chart to examine:
I ran across this chart on the University of Cincinnati Health website and the reason I picked this chart is because it's actually a pretty good chart.
  • It's the right chart type for the data. The bar chart allows for easy comparisons visually between the institutions. Bar charts are always a good choice for categorical comparisons.
  • It's ranked in order providing a quick and easy understanding of the verification rankings.
  • Reasonable abbreviations were used to shorten the names that would otherwise be very long.
  • The message is fairly Clear, UCNI is #1 receiving 13 verifications in 14 specialty areas of neurological care (note the benefit of a good title).
  • The chart has good use of color, emphasizing UCNI compared to the other institutions. Blue and red aren't exactly complimentary, but red is the University of Cincinnati color so that's a natural choice in this case. This color combination also avoids red/green which allows for someone who is colorblind to make the color distinction for the same visual message. You can test your own images at http://www.vischeck.com or download the free Adobe Photoshop plug-in.
  • The chart has good detail in the note section which gives the reader more information on how the designations are done and the fact that UCNI is working on the 14th specialty area.
  • From a design standpoint it is always best to use a dark font on white or light color and a light font on dark color. In this case the creator wisely chose a white font color on the color bars and black font on the white and light grey.
  • The gridlines are muted so that they are not distracting or creating a moire effect.

Compared to many charts out there, including some of the examples Cole has critiqued in her redesigns, this would be considered a pretty good chart. However, this chart can be improved when examining some of the finer design changes that can be made.

  • It's best to avoid rotated text whenever possible (Clear and Clean). In this case the text was only rotated by 45 degrees, so it's not as hard to read as it would have been if it were rotated a complete 90 degrees (which is commonly done on long labels). I try to avoid rotated text as much as possible, even small angle rotation. The text label "Barrow Neuro. Institute" is actually below 4 bars and requires the eye to follow that text to the end to determine the bar it represents. Try to quickly compare Barrow Neuro. Institute to UC San Francisco. The eye has a hard time keeping a place holder for the comparison. The best solution to solve this is to rotate the chart instead of the text. This allows the reader to read the text normally while still using the bars for the visual comparison. It also puts UCNI at the top of the of the list, which is where they are in the ranking.
  • There is no need for the y-axis label (Clean and Concise). The purpose of axis labels is to give an approximate value for the bars. In this case we have every bar labeled with the value. In cases where there are lots of categories (and this could be one of those cases) then it might be better to remove the individual data points and simply use the axis labels. If using that method then I might highlight UCNI with a single data point for emphasis (still keeping with Concise and Clean).
  • The gridlines are interfering with the paragraph of text (Clean). This is partially due to the increment of the gridlines being set as 1, but it's also the white gridline contrasting with the dark text. There are a number of ways to solve this, for example adding a slightly filled background box to the text or deleting the gridlines completely.

Below is an example redesign:


  • I used a free tool called ColorPic to get the exact colors that were used in the original chart. ColorPic is a utility that will extract the exact color hue, saturation, value and RGB color code from any point hovered over with the mouse.
  • In this case I copied the original color scheme exactly and did not make any adjustments for the gradient of the bars. I recommend avoiding gradient, but the use in this case is so minimal that I simply left it alone for now to preserve the original color scheme. However, notice that even with the tiniest of gradient effects there is still a visual impact on the bars. The left sides of the bars (and the bottom part of the original) are darker and seem to have more weight to them.
  • Axis labels for the values were removed since the bars have data labels.
  • The gridlines are now in increments of 2 instead of 1, but still muted.
  • The paragraph of text is now in the bottom right hand corner of the chart. Notice that I used a gradient effect on the gridlines, muting them to nothing on the bottom right of the chart. They serve no purpose in this region since the bars do not extend to this area. This allows us to keep the gridlines in the area where the bars are without interfering with the text.
  • I changed the font color of the institution names to blue to match the bar.
  • I placed a text box on top of the UCNI text label since Excel doesn't have an option to change the color of a single axis label like it does for a single data label. Now UCNI matches the red bar. 
  • I added the UC Health logo to add to the presentation.
  • Finally, I would usually add the author name and data source as a note at the bottom, but since I don't have the information from the original chart I am unable to do that.

Taking some liberties with the original color scheme and avoiding the gradient effect yields an even better version that isn't as dark and heavy. Note in this version I also removed the background fill and when doing that the bars will hang in the air. I agree with Stephen Few on this point who advocates using an axis bar in this case. Although it might be considered "more unnecessary ink" by some, I prefer this over the dangling bars and to visually set them at a baseline.


As I stepped through this same exercise this past week in my data visualization class, one of the students remarked on one additional improvement that I had mentioned in class as a best practice, but had neglected in this chart. They pointed out that having the data labels set at the inside base instead of at the ends of the bars is visually better. It puts the data point immediately next to the text labels and creates a data table that is easy to read vertically. This allows for quick, easy comparisons and doesn't force the eye to jump back and forth from right to left. While I don't think there is anything wrong with the chart above, I do agree with that best practice because it makes it a bit more Clear.


I hope this example showcases some of the finer points of design for data visualization. We often cover the topics of redesign where the charts are so bad that almost anything would be an improvement. In this particular case it is the careful attention to a few details and applying the 4 C's that help make this chart a better presentation of the data.

I would like to thank Cole again for this opportunity to write a guest post on her wonderful blog. Keep up the great work, Cole!

Jeffrey A. Shaffer

Jeffrey A. Shaffer is the Vice President of Information Technology and Analytics at Unifund. Mr. Shaffer joined Unifund in 1996 and has been instrumental in the creation and development of the complex systems, analytics and business intelligence platform at Unifund. Mr. Shaffer holds a BM and MM degree from the University of Cincinnati and an MBA from Xavier University where he was the winner of the 2006 Graduate Student Scholarly Project in Research. Mr. Shaffer has attended the Harvard School's Executive Education Program, is a Certified Manager of Quality and Organizational Excellence through the American Society for Quality, a Certified Project Management Professional through the Project Management Institute and has completed Six Sigma Green Belt and Black Belt training with the Xavier Consulting Group. Mr. Shaffer is also Adjunct Assistant Professor at the University of Cincinnati in the Carl H. Lindner College of Business teaching Data Visualization in the Graduate Course series for Data Analytics. He is also a regular speaker at business intelligence conferences and symposiums on the topic of data visualization, writes for the data visualization blog at MakingDataMeaningful.com for Lucrum, Inc. and was a finalist in the 2011 Tableau Interactive Visualization Competition.


Thursday, September 6, 2012

color me bad(ly)

Recently, a contact shared the following image with me, along with his thoughts. I found both amusing, so thought I'd share with you here, along with some of my own thoughts and a makeover:

From: http://www.consultingmag-digital.com/consultingmag/201207?pg=6&pm=2&fs=1#pg26

Commentary accompanying the visual:
This seems like some seriously simple data to present, but SOOO poorly executed. Looking at it hurts my head and leaves me with nothing but questions:
  • How much time does it take others to figure out the color pattern(s)?
  • Is there really even a pattern?
  • Why are the two legends/color-schemes different? Don't make me work so hard!
  • Why use donuts/pies instead of some simple paired bars/columns, or even just a pair of lines (i.e., a simple histogram)?

No matter what your content, this is the sort of reaction we should work to avoid in our data visualizations. In this case, it seems the color and donut form is meant to make the data more visually interesting, but it hinders our ability to understand the data.

There are a number of lessons we can employ here to make this data easier to comprehend:
  • If there is an intrinsic order in your categories, leverage it. In this case, the 2011 data has categories in order of increasing days away from home (starting at the lower middle left of graph with the light green segment and working clockwise around), but somehow neither this ordering or the colors of the categories carried over to the 2012 graph; rather, this graph appears to be sorted numerically by category. This makes comparing the segments of the pies even more difficult than it would otherwise be. Speaking of which...
  • Don't make people compare segments of two different pies (or donuts, in this case - substitute your fave dessert dataviz). Our eyes have a hard time measuring angles and areas: this difficulty is amplified when we're meant to do it across different pies/donuts, where the pieces are in slightly different places and there is no consistent baseline.
  • Put the things you want to compare close together. Physical distance between the things we're meant to compare makes comparing those things more difficult. In this case, a bar graph would allow us to put 2011 and 2012 right next to each other so we can get an easy visual comparison.
  • Use color strategically. Don't use color to make something colorful; rather, use color sparingly and strategically to draw your audience's attention to where you want it.
  • Tell a story with your data! Don't assume your audience will want to look at the data and make up their own story. If you look at the full article, the point they are trying to make is that consultants are traveling less in 2012 than prior years. I'm not actually sure this data shows that (it could be that the other groups surveyed are traveling less but the consultants are traveling just as much - we don't have that breakdown of the data to see). At any rate, I'd suggest making the point more clearly with the data and actually calling out the takeaway within the data visualization to help your audience know where to look for the evidence of what you're telling them.
Here's an alternative view of the same data, employing the lessons I've outlined above:


Thanks, Andy, for passing this less-than-stellar viz along and for your thoughts!

For those interested, you can download my Excel file with the above visual here.

Tuesday, September 4, 2012

a few words go a long way

Part of my day job is internal consulting to our analytics team. One of our interns is getting ready to present findings from his summer project and asked for help visualizing results. This is a part of my job that I really enjoy - helping make the "so what", the "why this is important or interesting" part of an analysis we've undertaken visually clear.

As with many of my work-related examples, I have to keep the details confidential and generalize the situation a bit. In this case, we conducted a study where there was a baseline group receiving no treatment, and then several possible categories of treatment received by other groups. We were looking to understand the difference in impact these various treatments would have on a given outcome.

Here was the original data viz (slightly generalized from the original form):

My initial feedback looked something like the following:
  • Nice use of preattentive attribute (color) to draw your audience's attention to where you want them to focus.
  • The graph needs a title. The legend should be closer to the data it's describing.
  • If baseline is what the audience is meant to compare to, put that first and make that clear - think of adding a summary stat on the right side of the bars that is "increase vs. baseline" or similar.
  • I'm not sure the grey bars are adding value? If they represent 100% minus Outcome observed, stack them on the green bars to add to 100% and make that clear.

After discussing live, I spent a little time with the visual and ended up here:


In addition to incorporating the feedback outlined above, I also separated the Baseline visually and added a subtitle to the treatment groups to try to make it clear that each treatment is meant to be compared against the baseline (reinforcing this via the summary stat on the right).

Note that we aren't done at this point - the story still needs to be put around this data. In this case, the story could be something like "Treatment A results in highest increase over baseline" and a recommendation for rolling this treatment out more broadly. But note how some relatively minor formatting changes and the addition of a few words makes the information easier to consume.

The Excel file for the latter version is downloadable here.

Saturday, September 1, 2012

words in print!

Courtesy www.alliancemagazing.org

After speaking at the European Foundation Centre's annual conference earlier this year, Alliance magazine (whose audience is primarily those in the European philanthropic sector) reached out with interest for a short article on best practices for telling a story with data.

Said article was recently published in their latest edition. You can view the article here. Enjoy!

Sunday, August 26, 2012

and the winner is...

A big thank you to everyone who participated in the data viz challenge earlier this month (and thanks for your patience in awaiting this recap). As you may recall, the challenge was to help a philanthropic organization communicate a bunch of data about their various affiliates. If you're interested in a refresher on the details, you can find the challenge post with the full description here.

In this post, in addition to announcing the winner, I'll show a quick recap and my reactions to each of the submissions.

Submission 1: Peter Osbourne
You can view Peter's full description of his thought process in the comments of the post linked above. His main point was that, depending on the story one wishes to tell, a summary metric like averages may do the trick. Below is a snapshot of his workbook (he added the columns after the yellow one; full workbook can be downloaded here). In his comments, he makes a great point about figuring out what the story is first and then determining what data you have that best supports it (vs. putting together data and then trying to form the story).


Submission 2: Jon Schwabish
Jon decided on an interactive Excel graphic (download available here), which allows you to toggle across the various affiliates to get relevant detail on each. I really like the simplicity of the visual design used here. Great use of preattentive attributes in the line graph to make the blue line stand out from the others.


Submission 3: Lubos Pribula
Lubos continued the interactive Excel dashboard trend (downloadable here). I like the use of color to visually tie the line graph to the tabular data below (though we should be careful about the red-green color combination, which can be difficult for those who are colorblind). I also like the embedded bar charts within the tables at the bottom, which allow you to quickly visually compare aggregate measures across the various affiliates.


Submission 4: Gautham
Gautham created a dashboard in Tableau (if you don't have Tableau, you can download Tableau Reader here; Gautham's dashboard can be downloaded here). This dashboard allows you to view a single affiliate at a time and see a visual of their total assets in bars and number of gifts and grants via lines. This is useful if you want to compare the number of gifts and grants, or get a sense of the over time trends for a specific affiliate.


Submission 5: Rupert Stechman
Rupert took an unconventional approach to his data viz and went old school with pen and paper (which I love!) and created a sort of heatmap showing net change in assets over time by affiliate. Here's what he came up with (his blog post is here):


AND THE WINNER IS... Submission 6: Jeff Shaffer
Jeff created both a Tableau dashboard (downloadable here) and an Excel dashboard (pictured below; downloadable here). He doesn't win because he submitted dashboards in multiple forms, but rather because his visual is the one the foundation said they could see themselves using.

Here's what the philanthropic organization said: Thank you so much for trying to help us get a visual for our data. Your readers are much more skilled than I, and did some really interesting things with the data. I think Jeff Shaffer came closest to getting us something like what we need. His dashboard approach would be really useful in some instances."


Personally, I would have had a hard time choosing a winner (one reason I'm happy the philanthropic group made the decision for me!) - there are components I like from each of the visuals and I think each could work well, depending on what story you want to tell and who the audience is. This is a great reminder how important those pieces are - it's really difficult to create the perfect visualization without a good understanding of what story we want to tell and who we want to tell it to. We should absolutely spend time up front establishing that (and coaching our colleagues and clients to do so) before we create the supporting visual.

9/4 UPDATE: Jeff graciously put together a "how to" for creating the dashboard above, which you can download here.

Cole's non-competing submission
And I of course couldn't help but build my own visualization of this data as well. I did not go the interactive dashboard route, because the description made it sound like it was important to understand the trends for a given affiliate while also being able to compare those to other affiliates (hard to do in a dashboard that focuses on one affiliate at a time, though a couple of the above submissions address this in different ways). Here's a snapshot of what I came up with (I just show 4 here, but this approach continues for each of the affiliates; the Excel file is downloadable here):


Thanks, all, for playing (and Jeff, my offer stands to have you write a guest blog post if you're interested!). Let me know if you think I should pose challenges like this again in the future!

Wednesday, August 22, 2012

how long it takes to get pregnant

I love when data viz and life intersect. This happened for me recently, when I came across the following visualization - it's from a post a couple of months ago on flowing data.

How Long it Takes to Get Pregnant
Slightly modified from this post
The graph shows the odds of getting pregnant (y-axis) by the number of months one (or two as would typically be the case here) tries to get pregnant. The different colored markers denote the age (I assume of the female) trying to conceive. This shows that 25 year olds will nearly always get pregnant within a year of trying to conceive, and that this probability decreases the older you are.

How does this intersect life, you may ask? I had one empirical data point to add to the graph, denoted by the * at the (x, y) coordinate (5 months, 100%). Colored correctly, it would be somewhere between yellow and green.

For anyone who is still scratching their head to figure out what I'm talking about... 
I'm due in February!

Friday, August 10, 2012

evaluating word clouds

Word clouds created a bit of buzz when they first became popular a couple of years ago (or at least that's when I encountered them for the first time). Like the infographic, they have a bit of sex appeal that draws you in. As in the case of infographics, however, I often find that upon further evaluation they tend to be a letdown - full of fluff without so much informative value.

While facilitating a workshop recently, I heard a horror story about someone who had tried to create a word cloud by hand (perhaps the scariest part of the story involved scaling text boxes one at a time). Lesson: in data viz (and in life), if you find yourself doing something tedious and repetitive like that, stop to reevaluate. At minimum, do a Google search. Even better if you can find a blog post or related article on the topic from someone who has encountered the same challenge before and identified an eloquent solution.

In the case of word clouds, there are a number of applications you can use to generate them. Wordle is a popular free product (created by Jonathan Feinberg of IBM, note that if you upload your Wordle to the gallery, the data goes with it, though you can also opt for local-only word cloud generation) that allows for quite a bit of customization of color, size, font, etc. Google docs has a word cloud gadget within spreadsheets. There are a number of others, easily located via a Google search.

But before you start thinking about generating word clouds, let's continue our discussion on their efficacy. Their sexiness can draw you in. But is there value beyond that? I think it comes down to the use case. I've got one example for the negative and one for the affirmative.

Poor use of word clouds
First, let's take a look at an example from a Community Health Center. My understanding is that they employed a consultant to analyze some survey data from their clients. The consultant put together a report filled with pretty word clouds like this one:


Good service is... minutes? Part of the challenge in this case is that the connotation has been completely stripped away from the nouns, removing the sentiment behind the comments. Which is kind of the important part of the comments, in my opinion. But in reading the report, buried near the end of it, I found the following:

The consultants took the time to content-code the comments. These categories and their descriptions are much more useful for understanding what people value than the word cloud. With this info, we can direct action: we get an understanding of what's going well that we want to maintain, as well as potential areas for improvement. We could take this a step further of making the data visual like this:


In this case, I think the simple bar chart is much more useful (in terms of both understanding the information and determining how to act on it) than the word cloud. Now let's look at a better use of word clouds.

Thoughtful use of word clouds
Caveat: this example came to me by way of the telephone game (I heard it from someone who heard it from someone), which means it's guaranteed that I don't have the details totally right. But I think this still serves well as an example of a good use of word clouds. The story goes: Apple stores obviously really value customer service. They use surveys to collect info about each store. Each day, they create a word cloud for each store based on customer comments. What they are looking for are 5 (I'm making that number up, I don't know what the real number is) specific words - things that are considered must-haves when it comes to customer service in their stores. It's when these [5] words don't show up prominently on the word cloud for a given store that a red flag is raised and some sort of action is taken.

This is what I would consider a thoughtful and actionable use of word clouds. If the required word doesn't appear, some sort of intervention happens.

We can generalize this to the following: when you're considering using a word cloud, think about what you want your audience to know and what you want your audience to do. Then ask yourself if a word cloud will enable them to know and do those things.

And for goodness sake, if you do use a word cloud - leverage some of the tools that exist - don't try to create it by hand!