Monday, April 28, 2014

why I disdain most infographics

Too many offenses to sensible data visualization to list. It's unfortunate, too, because there are some compelling stats lost in the cartoony graphics.

Gates Foundation Inventions

Wednesday, April 23, 2014

focusing with color

In my previous post, I discussed the distinction between exploratory and explanatory analysis and showed how you can sometimes leverage the same visual when moving from the former stage to the latter, with some minor tweaks. Today, I'd like to consider another example of this and also illustrate how you can use iterations of the same visual to focus your audience with color.

We'll continue with the imagined scenario where you work for a car manufacturer. Today, you're interested in understanding and sharing insight around top design concerns for a particular make and model. Your initial visual might look something like the following:

The above visual could be one of those you create during the exploratory phase: when you're looking at the data to understand what might be interesting or noteworthy to communicate to someone else. The above shows us that there are 10 design concerns that have 8+ concerns greater than 1,000 (the rest of the tail has been chopped off, which would probably be worth a footnote with a little detail on how long the tail is, perhaps how many design concerns there are in total, etc. if you're using this to communicate to others).

You can leverage the same visual, together with thoughtful use of color and text to further focus the story:

Continuing to peel back the onion, we can go a level further than this, again using the same visual with modified focus and text to lead our audience from the macro to the micro parts of the story:

Repeated iterations of the same visual, with different pieces emphasized to tell different stories or different aspects of the same story (as above) can be particularly useful in live presentations, because you can orient your audience with your data and visual once and then continue to leverage it in the manner illustrated above.

If you're interested, you can download the Excel file with the above visuals here.

Monday, April 14, 2014

exploratory vs explanatory analysis

I often draw a distinction between exploratory and explanatory data analysis. Exploratory analysis is what you do to get familiar with the data. You may start out with a hypothesis or question, or you may just really be delving into the data to determine what might be interesting about it. Exploratory analysis is the process of turning over 100 rocks to find perhaps 1 or 2 precious gemstones. Explanatory analysis is what happens when you have something specific you want to show an audience - probably about those 1 or 2 precious gemstones. In my blogging and writing, I tend to focus mostly on this latter piece, explanatory analysis, when you've already gone through the exploratory analysis and from this have determined something specific you want to communicate to a given audience: in other words, when you want to tell a story with data.

Keeping this distinction in mind, I thought it might be interesting to look at a recent makeover and show how the visual you could use for the exploratory and explanatory steps of the analytical process might differ.

For this (generalized & simplified*) example, imagine that you work for a car manufacturer. You're looking at customer feedback, specifically to better understand how failed or less-than-ideal performance across various dimensions for a given make and model impacts customer satisfaction. The primary output variable you're looking at in this case is an overall question in your customer satisfaction survey, where customers are asked to express their overall satisfaction with their car along a 5-point Likert scale (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied). Let's assume you're most interested in anyone responding with anything other than Very Satisfied, and want to understand how this varies by customers who have reported specific issue(s) with their car, by the type of issue.

*Please keep in mind that I'm making up the specific scenario here; the makeover is a generalized example from a past workshop where I don't have all of the details and also don't have other data that would possibly be of benefit in the exploratory and explanatory phases. For example, there are likely other things that drive the overall satisfaction with the car, which we're ignoring here. Also, anytime you show percents like this, I'd recommend also showing the N count - in this case, the number of people reporting the given issue - which will be helpful for the interpretation of the data.

Your initial visual might look something like the following:

In the above, I've grouped all of the "less than very satisfied" responses (in orange), with the data arranged in descending order of this metric. With this visual, you can scan through the various issues and see the relevant "less than very satisfied" metric. This might be useful for part of your exploratory analysis.

Once you've identified something or some things to focus on, in some cases it will make sense to create a different visual for the purpose of focusing on that thing or those things. Alternatively, the same visual and be modified for explanatory purposes by drawing attention to the points of interest, while preserving the other data for context:

We can use the same visual and approach for highlighting another potential point of interest:

Or another:

Note how, when we focus on one aspect or story, it's actually harder to see the others. That's one of the reasons it's important to do exploratory analysis before you get to the explanatory phase: so you can have confidence that you're focusing your audience on the right thing(s).

In case it's of interest, the Excel workbook with the above graphs can be downloaded here.

Thursday, April 10, 2014

just because you have numbers doesn't mean you need a graph

I subscribe to updates from the Pew Research Center. They arrive in my inbox with subject lines like "Future of Internet, News Engagement, God and Morality" (yes, this was an actual title from their March 13th update - quite a span of topics!) and probably 90% of the time get moved to my trash without a second thought. But in a fraction of cases, something in that subject line catches my eye and I open the email to read more. Sometimes, this even prompts me to click further to the full article.

The snippet that caught my attention this time was "Stay-at-Home Mothers on the Rise." The link I clicked on within my email brings you here.

A quick scan through and I found that I was hardly able to focus on the article because of the issues plaguing the visuals that accompany it. There are many. But I'll focus on just a single one today and keep this rant very short and sweet:

Just because you have numbers doesn't mean you need a graph!

The following graph prompted this adage:

That's a whole lot of text and space for a grand total of two numbers. The graph does nothing to aid in the interpretation of numbers here! Even the fact that 20 is less than half of 41 doesn't really come across clearly here visually (perhaps because of the way the numbers are place above the bars?).

Rather, the above can be conveyed in a single sentence:
20% of children had a "traditional" stay-at-home mom in 2012 (compared to 41% in 1970). 

Just because you have numbers doesn't mean you need a graph!

For a less ranting delivery of a similar lesson, check out my post the power of simple text.

Monday, April 7, 2014

a storytelling with data ad

One of my favorite indulgences on a weekend morning is to sit in the sun on our terrace and read the latest copy of Dwell magazine. A number of things in the universe have to align to make this possible: namely, the sun must be shining and the child must be sleeping. The universe aligned in just this way this past Saturday (bliss!).

I find that the design of products and the design of spaces can sometimes influence my thinking, spark an idea, or act as inspiration when it comes to the visual design work on which I focus much of my attention. On this particular read through Dwell, it was the following advertisement that caught my eye:

This ad caused me pause for a few reasons:
  1. The leading stat - 1 in 5 children go to school hungry - is powerful. When it comes to communicating a number or two, tables and graphs don't usually have a place, as the numbers themselves carry a lot more attention-grabbing power.
  2. The use of preattentive attributes to make certain elements of the visual distinct: the numbers at the top are in bold, all caps and underlining draw attention to the second line, and the sort of sea-green in the logo and text at the bottom emphasize the un and is (when it comes to this last point, I might have chosen different portions of text to draw attention to, but I think that's one of those things that can be up for debate and probably there was a good reason the designer chose these particular pieces - perhaps the dichotomy between un and is?).
  3. The story. It's short and sweet, but still a robust example of storytelling with data, which, with the personal anecdote and picture are made to be much more human than a simple stat on its own would be.
  4. The picture. Speaking of pictures, one frequent question in my workshops is about the use of pictures when it comes to visual communication. I don't use pictures a lot personally, but as mentioned above, I do definitely think there are ways to use pictures that appeal on a different level than numbers do. Here, I think the pairing of the two is effective.
What do you think of this ad as an example of storytelling with data? Is it effective? Why or why not? Leave a comment with your thoughts!

Wednesday, April 2, 2014

US prison population revisualized

The following graph caught my eye recently in my Twitter feed:

I've been debating whether to post about it (and finally decided that I couldn't resist).

I don't want to rip it apart.

Well, that's not entirely true.

I do want to rip it apart, but it's not in an effort to be mean. The above visual breaks pretty much every best practice out there when it comes to effective graph design. It's simple data. Probably not so much is being lost in terms of being able to interpret the data through this less-than-stellar data viz. But the specifics of the design choices (or lack thereof) drive me batty. To the extent that I can't help but comment and resolve to show what it has the potential to be.

First, let me list the main components that get under my skin (and I should note that it's possible some of these are constraints in the tool through which the above visual was published, which I have not used directly) :
  • No meaningful ordering to the data (rather, the categories are shown in reverse alphabetical order... not so helpful);
  • Lack of axis labels (sure, we can infer, but why should we have to?);
  • Diagonal text on x-axis (avoid, avoid, avoid!); and
  • Grey background, white vertical gridlines, and black bar outlines add unnecessary clutter (eliminate!).

I think the only positives I have to say about the original visual are: 1) a horizontal bar chart is a good choice here because we're dealing with categorical data with long category names, 2) good descriptive graph title, and 3) it makes me happy to see the data source listed (both as general good graph hygiene, as well as because it allows me to get to the source data to remake the visual).

Speaking of remaking the visual, here's what it could look like when we tackle the above issues:

If we just want to show the data, we could proceed with the above. Taking a cue from the original visual - a single point is labeled with its corresponding value: Drug Offenses - perhaps there is a story here worth highlighting. If that's the case, our visual might look something like the following:

Meta-lesson: if you're going to go through the effort of visualizing data, take the time to be thoughtful about your design choices!

If you're interested in the Excel version of the above makeovers, you can download it here.