Friday, October 31, 2014

annotated line graph from Uber

With the email that hit my inbox earlier this afternoon, Uber has impressed me twice in the past week. The first time was in response to a simple comment that accompanied my '3' numerical rating (the lowest I've ever given): "With the world series game today, should have avoided stadium area." I had an email in my inbox from Uber's customer service within the hour agreeing that was a silly route given the Giants' game and reducing the price to what it would have been without the crazy traffic. Amazing.

And now they've done it again, this time via effective data viz. The annotated line graph below shows expected Uber demand over the course of the evening and into the wee hours of morning. This is one of those rare cases where they can get away without showing the y-axis values at all, since the relative peaks and valleys are more interesting (and meaningful) than the absolute numbers.


Nice job Uber. Though I must say this makes me happy to report that kiddie Halloween in my neighborhood is on foot, so no need to even think about surge-pricing here!

Speaking of which, I find it impossible to publish a post on Halloween without couple pics of my superhero family.


Happy Halloween!

Tuesday, October 7, 2014

SF housing cycles visualized

If you've shopped for real estate in San Francisco recently, you've likely experienced the crazy world of multiple offers, waived contingencies, and all-cash deals well above asking price. We've been house shopping here for nearly two years, without much to show except a jaded view of the market and an ever-increasing pile of home-for-sale flyers. My husband and I joke that our toddler will grow up thinking that's what you do on the weekends: go look at other people's houses.

If you've been in this situation, or a similar one, you've perhaps also wondered (like us) whether prices will continue to increase at the rate they have been, or if there is an elusive bubble that is about to pop. To that end, we came across the visual below, which depicts a simplified view of San Francisco housing market cycles over the past few decades.


If you've followed this blog for long, you might expect that I will next proceed to rip the above visual apart. But I am not going to. 

I actually really like it. 

Sure, there are some minor things that could be changed. But let's focus instead on the good: it's well-labeled, both in terms of titles and text annotation on the graph itself. There is a clear narrative that calls out some interesting things in the data. For example, over the past 30+ years, the period between a recovery beginning and a bubble popping has been about 6 years.

According to the graph, the last recovery began in 2012, which would put the next bubble pop at approximately 2018.

Which means there's still time to buy before we hit the peak... 

Thursday, September 4, 2014

show the full picture!

There is still space available in my upcoming San Francisco public workshop; details and registration can be found here. Click here to suggest a location for a 2015 workshop.

I've posted a number of times about Pew Research articles. Well, not the articles exactly, but rather the visuals they contain. To be honest, it's rare that I read the actual article. I scan the headlines as they hit my inbox and if something piques my interest, I follow the link and scroll through the article, not reading, but taking a discerning look at the graphs.

This is what I was doing when the following visual caught my eye within an article titled, "Perceptions about women bosses improves, but gap remains." The full article can be found here.


They are nice looking graphs, as is the norm. But upon examination, for me both of these visuals leave out an important part of the picture. Let's examine them one at a time.

I'll start with the top chart titled "Boss Gender Preference." The first thing I want to do with a graph like this is add the percentages to understand what proportion of the overall population we're talking about. Everybody? No. In this case, a little math tells us that a lot is missing, especially in the more recent years. My "data-spidey sense" (as a former boss of mine used to call it) gives me suspicious pause. I go back and read the title, etc. to try to put what is missing (the piece or pieces that would enable the lines to add to 100%) into context. In this case, the lines represent the percent of those responding to the survey who said they prefer male bosses and the percent who prefer female bosses. So, I would assume that the remainder are those who have no stated preference (and only now as I write this do I see the footnote below the second graph confirming this; for sake of this discussion and my forthcoming makeover, I'll assume "no opinion" is the same as "no preference"). In recent years, this accounts for nearly half of the total. That seems important. And yet isn't shown explicitly.

Let's look at a different view of this data:


In this remake, I replaced the line graphs with stacked bars so that we can see the full 100%. Those preferring male bosses are along the bottom in blue, female along the top in orange so that we have a consistent baseline at both bottom and top to be able to compare what has happened over time for those groups. I only graphed the points that were labeled in the original graph, omitting the 2000 data point altogether so that I'd have a (roughly) consistent gap between the dates along the horizontal x-axis. That data point was strange to me in the original graph, anyway, since preference for both male and female bosses increased (while % indifferent went down), but then bounced back to the previous trend. It looks like something strange may have been going on there (was the question worded differently? had something recently happened in current events that influenced this? I'd want to better understand this, however since it doesn't seem critical to the overall story I'll simply omit).

With this view, you can visually lump together the grey and orange bars, or even focus on just the grey to see progress on a different level than was shown in the original graph - from an overwhelming preference for male bosses to lessened sensitivity when it comes to the gender of one's boss overall.

After plotting the data this way, I considered that perhaps a line graph would work better, but with an additional line for the indifferent portion. I drew it, but after seeing it, it reinforced for me that the stacked bars work better for being able to add different data points together, which I think is important here. In case you're curious, here's what the line version looks like:


Next, let's turn our attention to the second visual: Female CEOs. The graph as drawn by Pew Research looks like great success: up and to the right. In this case, frame of reference is critical. Yes, 4.8% is huge relative to zero. But it's really small compared to the potential (100%). Here's how I'd graph this one:


With this view, there has been progress - yes - but there is potential for much more.

As a side note, it also bothered me that the dates were totally different between the two graphs (it seems like you should be able to compare the data points across them, but if you follow 1975 in the original top graph down, this lines up with something like 2002 in the bottom graph). I attempted to address this through the titling of the graphs ("the past 60 years," "the past 20 years") but this isn't a perfect solution. Part of me wants to have a bunch of empty space on the left of the second graph and make it so the 20 years of data we have there lines up to the top graph along the same date scale, but I think this would squeeze the data we do have too much to be legible. So I settled with attempting to address through the titles.

Here is the side-by-side of the two visuals:


Note how in both cases above, the remake causes us to see a different story and perhaps even draw different conclusions than we might have with the original visuals. For me, the makeovers present a fuller picture.

It is possible that this fuller picture is all explained in the article. I really can't say, as I still haven't read it. I am probably not the only one who "reads" this way. Which is another lesson: design your visuals so they still work when your audience doesn't read the accompanying text!

In case you're interested, my other recent posts on Pew Research visuals can be found here and here. The Excel file with the makeovers from this post can be downloaded here.

Tuesday, August 26, 2014

design with audience in mind

Recently, my husband shared a USA Today graphic with me that summarizes diversity stats across a number of Bay Area tech companies. Surely, this would be a good blog topic, he told me. He knows me well. Here is a screenshot of the visual:

Online version can be found here.

First, let me mention how cool I think it is that companies like Google have started sharing their diversity stats. I expect that with this transparency, we'll see movement towards more diverse workforces over time.

Next, let me discuss what an annoying user experience it is to try to look at the diversity data with USA Today's visual. It shows the breakdown for the given company (Apple, in the above screenshot) by gender on the left and ethnicity on the right. The various tech companies each have their own tab; you can toggle between companies using the numbered tabs along the left (not sure what the numbers on the tabs mean...if anything).

What is the first thing you want to do with this data?

For me, the stats for a given company, on their own, are not so interesting. It's by comparing them to the other companies that we help build context for what is good (or if not good, then at least better), what is worse, and so on. In other words, the single thing I want to do most is compare the stats across companies. The way this visual is organized makes this a lot harder than necessary. If I want to compare the proportion who are women at Apple (for example) to other companies, first I look to the Apple tab and commit 30% to memory, then I click through the other tabs one by one to try to put that 30% into context. This is annoying, but possible.

It gets more annoying and difficult if you try to do it by ethnicity. Try comparing the proportion Hispanics make up of the various workforces, for example. It's further complicated by the fact that the slices on the pie move and the order in which the companies are listed changes as you toggle between companies.

This is not an ideal user experience. My guess is that there was some desire to make the visual "interactive," which it sort of feigns via the tabs of various companies along the left. But really all this does is allow you to see the various static graphs, one at a time. Why not replace with a single static visual that makes the task your audience is going to want to do easy?

In other words, let's design the visual with our audience - and how they are going to want to interact with the data - in mind. If the goal is to compare across companies, I might do something like the following:


(Note that the title and takeaway at the top were preserved from USA Today's visual; I'm not sure I would have been quite as negative.)

The above version allows me to see things that were very difficult to get to with the original. eBay is doing the best from a gender diversity standpoint, but worse when it comes to racial diversity, where Yahoo is doing better than the others, etc.

Bottom line: design with your audience in mind!

Click here to download the Excel file with the above visual.

Monday, August 18, 2014

nice summary by UP

I have been religiously wearing my UP24 band over the past two months, after taking a hiatus from the technology while pregnant. I originally strapped it on to have record of my sleep. With a newborn, of course sleep looks much different now; there's something strangely gratifying when you can not only know that's the case, but also see it. Over time, I've seen the number of night-wakings generally go down (though last night was an exception, which I feel as I groggily type this post) and sleep consolidate into bigger chunks as my little chunk sleeps for increasingly longer segments. I can start to see a pattern emerge (bed at 11pm, wake for feeding at 2-3am and again at 5-6am, get up around 8am). Visual evidence of slow but measurable progress!

What caught me by surprise when I started wearing the band again is the motivation it inspires when it comes to my activity level. The recommended goal is 10,000 steps per day. When I don't hit it, I feel a bit of shame. When I do hit it, I feel a gratifying sense of accomplishment. That sense of accomplishment goes up as the amount by which I surpass the goal increases. This motivates me to get out and move on a daily basis to ensure I'll hit my goal.

So all of this is a long prelude to the summary from UP that hit my inbox this morning. I tend to post a lot of examples of data viz with which I take issue, so thought I'd mix it up and focus this post on one that I found to be effective. Of course there are things that I would have designed differently, but this summary gets the job done. It's keeping me motivated. Let me step you quickly through what it shows.

It starts with an overall summary of week-over-week changes:


Relative to the prior week, my average sleep per night went down a hair and my movement increased a little. I like the big, clearly articulated takeaway: you held steady.

This is followed by detail on my sleep this past week:


My average nearly hit the nightly sleep goal of 8 hours. I even beat the goal three times (versus prior weeks where I haven't hit it at all!). In fact, this summary looks perhaps deceivingly good, though the number of nights of uninterrupted sleep, at 0, starts to point to the newborn effect. It will be life-changing when that number moves, even by one!

The sleep summary is followed by a movement summary:


I hit my 10,000 step goal each day (my informal goal for myself is to hit it every day in August). You can see the days where a jog or long walk really put me over the top. My most idle time of 8-9am comes as no surprise, as that's when the little one eats breakfast, sequestering me to an armchair for the better part of an hour. The rest of his day remains less predictable.

It's a straightforward and simultaneously (for me, at least) motivating summary.

For more on UP from a numbers-person's perspective (including downloading the data it captures to analyze on your own), check out Nathan Yau's recent review here. For more on the cool insights the team at Jawbone is starting to make based on the crazy amount of data they are amassing, check out their blog (for example, this post). They've shared it with others who have started to analyze and share as well (here's a recent WSJ example, though I still lament the lack of color-key on the heatmaps - ok, turns out it is impossible for me to write a blog post without critique!).

Someday, I'll download all of my data and perhaps do something fun with it. For now, I'll continue to check out the daily and weekly summaries to track my progress and for that feeling of accomplishment when I hit my goals.

Monday, August 11, 2014

the challenge of teaching data visualization

You want to increase your skills when it comes to creating effective, captivating, and informative data visualizations. I want to teach you. But with this, comes certain challenges. It is these challenges that I hope to discuss in a SXSW 2015 panel (along with three expert colleagues: Jon Schwabish of policyviz.com, Kaiser Fung of Junk Charts, and Ben Schneiderman of University of Maryland).

But we need your help.

Over 3,000 proposals have been submitted for SXSW Interactive. Obviously, only a fraction of these will be chosen for sessions. Public voting accounts for about 30% of the decision making process. That's where you come in.

Please take 20 seconds to vote for our session by clicking the button below (which will prompt you to create an account before voting if you haven't already - it's super fast, I promise).

Vote to see my session at SXSW 2015!

Check out the following slideshare for more detail on the session topic.



Thanks and I hope to see you at SXSW 2015!

Thursday, July 31, 2014

love and hate for NYT graphics

I rarely find myself in front of a computer these days. My time has been overtaken by a tiny little man (related post), who insists on spending hours a day with me, sitting in a rocking chair, at least one arm rendered otherwise useless by cradling and cuddling (not a bad way to spend one's time, I must admit). Only in the past couple of days have I emerged from my lack-of-sleep haze to realize that it only takes one hand and my cell phone to reconnect with what's happening in the world via Twitter and Feedly.

It was during one such cuddling-and-catching-up session that I came across the recently published New York Times article, Gains seen for Medicare, but Social Security holds steady. To be honest, I'm less interested in the findings, but the data visualizations within the article caught my eye.

At first glance, the two visuals look really clean and well-designed. Still, I am initially a skeptic when it comes to looking at any data viz. I started out hating the two data visualizations included in the article, but with a bit of patience, my feelings morphed from hatred to... well... I guess we can call it love and hate. Let's take a look at the two visuals included in the article and do a little analysis of each.

Here is the first:


My initial thought was that, with time on the x-axis, the above should be a line graph. But I was too quick to judge: it's not exactly time that's being plotted, but rather the forecast for expected Medicare solvency at the given point in time. Given this, it makes sense to treat the points as discrete (rather than continuous) in a bar chart, as has been done above.

My next would-be beef was with the gridlines drawn across the bars. Gridlines often add clutter, bringing little informative value with them (and making the visual appear more complicated than necessary - related post). But the increments of 5 on the y-axis and coordinating gridlines allow your eye to do a bit of math without your brain really having to. The gridlines within the bars could perhaps be made a little thinner so your eye would still see them without the cluttering effect, but this is minor.

While it took a little time to like the above components of the graph, other design features were love at first sight: it's well-labeled with clear title, axis titles and labels, the words above the graph tell you what you are meant to takeaway while attention is drawn to this point in the data - the most recent forecast - via difference in color.

Now let's turn our attention to the second visual included in the article:


This time, I'll begin with the components I like. Again, the takeaway is clearly articulated via text. Everything within the graph is clearly labeled. But in this case, I'm having a hard time moving to full-on love. The background shading and gridlines - though I can understand the motivations for them - bother me. And the labeling within the graph just doesn't seem as clean to me as it could be from a placement standpoint.

I really wanted to remake this visual, but was unsuccessful in finding the data being graphed and not patient enough to take the time to eyeball it. When I was considering the design choices I would make (get rid of grey background and gridlines, change the forecast portions of the lines to dashed lines, label the series with both title and % change to the right of the 2023 projections), I read the takeaway at the top again and realized that I don't even agree with it. The callout says the forecast is for faster growth for Prescription drugs and Physicians, yet the slope for the Hospital line is steeper (faster growth) than the Prescription drugs line. I assume it's true that the increase over the entire period forecast for Hospital is 25%, as noted, but the forecast is for a brief reduction followed by rapid increase, so I find this description to be misleading. 

Based on the data alone, to me more interesting is the inflection point and subsequent forecasts for Physicians and Hospitals. Historically, Hospitals have accounted for the majority of cost, but this is projected to change, with Physicians expected to make up a bigger (and rapidly increasing) proportion of beneficiary cost going forward. Interesting. I wonder why that is?

Perhaps this is explained in the article. But my call-to-duty by the little man is bound to be soon, so rather than go back and read the article, this is where I'll wrap up today.

My hatred turned to love in the initial visual, but I failed to get there in the second case.

What do you think? What do you like about these graphics? What would you change?

Monday, July 14, 2014

lead with story

July is storytelling month over at the Tableau Public Blog; the following is a guest post I authored.

When asked to write a guest blog post for this month's focus on storytelling, I spent some time reflecting: if I had just a single lesson to share, what's the #1 piece of advice I'd give in this space? I'd boil it down to three simple words: lead with story.

It may sound counterintuitive, but success in data visualization does not start or end with data visualization. To resonate with your audience, you need to do more than simply show data. Attention and time should be paid to the context for the need to communicate: what does your audience need to know? What do they need to do? How can you make the data you want to share meaningful and memorable? Part of the answer is story. Stories resonate and stick with us in ways that data alone cannot. Purposeful story can bridge the gap between showing data and imparting information.

Now, if you're an analyst by training (like me), "leading with story" might strike you as a little off-putting. This can be an uncomfortable space for many. Often, this seems to be driven by the belief that the audience knows better and therefore should choose whether and how to act upon the information presented. In other words, that they should be the ones creating the story. I would argue this is rarely (if ever) the case: if you are the one analyzing and communicating the data, you likely know it best, you are a subject matter expert. This puts you in a unique position to interpret the data and lead people to understanding and action. So, while it may feel more comfortable to lead with the data, I recommend you fight this urge when it comes to explanatory analysis and lead with story.

To ensure you story comes across clearly, there are two lessons to keep in mind: 1) don't make your audience wait for it, and 2) don't make your audience work for it. Let's discuss each in a little more detail and then look at an example of these lessons in action.


Don't make your audience wait for story
Don't bury your story: lead with it! Too often, I see situations where the communicator of the information wants to take the audience through the same chronological path they took to reach their conclusion. In most cases, this is unnecessary. Rather, lead with the "so what" and then back up into the path you took to get there only if absolutely necessary. This way, you don't leave your audience wondering when you're going to get to the point and run the risk of losing their attention before you do.

When it comes to crafting the narrative arc, I recommend storyboarding. Storyboarding is perhaps the single most important thing you can do up front to ensure the communication you're crafting is on point: it establishes a structure for your communication. Write each of the main points you want to make on a post-it note. Then you can play with different arrangements to get the right flow that makes sense given your audience and what you want to communicate. Once you get the flow how you want it using this low-tech method, you can leverage Tableau's Story Points feature to create this same narrative arc with your data visualizations. For more on storyboarding, check out this blog post.


Don't make your audience work for story
Spend time making the story you're telling impossible to miss in your data visualization by leveraging visual cues to help direct your audience where to look. Without these visual cues, our audience has to do work to figure out where they are meant to pay attention. When we ask our audience to do work, we run the risk of them deciding they don't want to and moving on to something else, at which point we've lost our opportunity to communicate. Preattentive attributes like size, color, and placement on page/screen can be used strategically to signal to your audience where to look in the visual for evidence of the story you are telling. For more on preattentive attributes, check out this blog post.


Lessons in action
Let's look at a simple example applying these lessons (if you're a regular reader, you may recognize this example, as I've used it before). Imagine you work for a car manufacturer. You're interested in sharing insight around the top design concerns for a particular make and model. Your initial visual might look something like the following:



While the preceding view may work as part of your exploratory analysis (where you're looking at the data to understand what might be interesting or noteworthy), it can be improved when it comes to explanatory analysis (where you want to communicate those interesting or noteworthy observations to someone else) by applying the lessons we've discussed.

First, let's think about what story we want to tell and make that clear with words:



In the above, we've made clear the point we want to make via the statement above the graph. However, our audience has to do some work to see the evidence of those words in the data. Let's reduce that work by employing some visual cues to help direct their attention:



In the above iteration, it's clear where our audience is meant to look through strategic use of color. We can even take this a step further, continuing the narration and use of color to tell a story with the data we are showing:



In this example, annotation and strategic use of color are combined to turn a simple graph into something more. Lead with story: don't make your audience wait for it or work for it.

Here is the above sequence published on Tableau Public.

Leverage these lessons and Tableau's Story Points feature to turn your data visualizations into compelling stories!

Monday, July 7, 2014

and then there were four

Three may be a magic number, but my favorite number of the moment is four.

As in, we are now a family of four.


We welcomed Dorian Werner Knaflic into the world on June 23, 2014. You may recall the timeline that I posted after Avery's arrival. In comparison, this birth was pretty much the opposite experience (we had an appointment, walked into the hospital prepared for what was happening, baby came home from the hospital the same day I did). I continue to be amazed at the absolute perfection of this tiny being.

And because it wouldn't be a proper storytelling with data blog post without a data visualization of some sort, I'll share the following, created from some of the stats I've been collecting, both by hand and with my UP24.


A couple things are clear: Dorian is eating plenty, as evidenced by his steady weight gain since hospital discharge on 6/26. The longest sleeping stretch I get is typically the one preceding the first nighttime feeding (though there have been some nice stretches between that and the second night feeding as well). I was (naively) hoping that clear eating patterns would emerge, but we aren't quite there yet. In time. Surely there are other interesting insights to be drawn, however since I'm operating on a somewhat impaired brain from broken sleep, I'm not going to look too hard for those now.

Rather, let's focus on the cuteness of this little one...

Dorian Werner Knaflic
Born June 23, 2014
6 pounds 11 ounces

Wednesday, June 18, 2014

leveraging animation: what you present vs. what you circulate

A common challenge in storytelling with data is the following conundrum. When presenting content live, you want to be able to walk your audience through the story, focusing on just the relevant part of the visual. However, the version that gets circulated to your audience - as pre-read or takeaway, or for those who weren't able to attend the meeting - needs to be able to stand on its own without you, the presenter, there to walk the audience through it. Too often, we use the exact same content and visuals for both purposes. This typically renders the content too detailed for the live presentation (particularly if it's being projected on the big screen) and sometimes not detailed enough for the circulated content.

I often tackle this topic in my workshops and have written about it here a couple of times before (look for any posts including the word "slideument," for example this one). In the following post, we'll look at a strategy for leveraging animation coupled with an annotated line graph to meet both of these needs.

Let's assume that you work for a company that makes online social games. You are interested in telling the story around how active users for a given game, let's call it Moonville, have grown over time.

You could use the following visual to talk about growth since the launch of the game in late 2012.


But in doing so, you run the risk of your audience focusing elsewhere in the data while you're talking. Perhaps you want to tell the story chronologically, but your audience may jump immediately to the sharp increase in 2014 and be thinking about what drove that. When they do so, they aren't paying attention to what you're saying.

Alternatively, you can leverage animation to walk your audience through your visual as you tell the corresponding points of the story. For example, I may start with a blank graph (which forces the audience to look at the axis details with you, vs. jump to the data; it can also help build anticipation that will help you to retain your audience's attention). Then I can subsequently show or highlight only the data that is relevant to the specific point I am making, forcing my audience's attention to be exactly where I want it to be as I am talking.

I might say - and show - the following progression:

Today, I'm going to talk you through a success story: the increase in Moonville users over time. First, let me set up what we're looking at. On the vertical y-axis of this graph, we're going to plot active users. This is defined as the number of unique users in the past 30 days. We'll look at how this has changed over time, from the launch in late 2012 to current, shown along the horizontal x-axis.


We launched Moonville in September 2012. By the end of that first month, we had just over 5,000 active users, denoted by the big blue dot at the bottom left of the graph.


Early feedback on the game was mixed. In spite of this - and our early practically complete lack of marketing - the number of active users nearly doubled in the first four months, to almost 11,000 active users by the end of December.


In early 2013, the number of active users increased along a steeper trajectory. This was primarily the result of the friends and family promotions we ran during this time to increase awareness of the game.


Growth was pretty flat over the rest of 2014 as we halted all marketing efforts and focused on quality improvements to the game.


Uptake this year, on the other hand, has been incredible, surpassing our expectations. The revamped and improved game has gone viral. The partnerships we've forged with social media channels have proven successful for continuing to increase our active user base.


At recent growth rates, we anticipate we'll surpass 100,000 active users in June! 

For the more detailed version that you circulate as a follow up or for those who missed your (stellar) presentation, you can leverage a version that annotates the salient points of the story on the line graph directly, as shown below.


This is one strategy for creating a visual (or in this case, set of visuals) that meets both the needs of your live presentation and the circulated version. Note that with this approach, it's imperative that you know your story well to be able to narrate without relying on your visuals (something you should always aim for regardless).

If you're leveraging presentation software, you can set up all of the above on a single slide and leverage animation for the live presentation (with the final annotated line graph positioned on top so it's all that shows on the printed version of the slide). If you do this, you can use the exact same deck for the presentation and the communication that you circulate. Alternatively, you can put each graph on a separate slide and flip through them; in this case, you'd only want to circulate the final annotated version.

If you're interested, the Excel file with the above visuals can be downloaded here.

Wednesday, June 4, 2014

alternatives to pies

My disdain for pie charts is well documented. While opinions on their usefulness run the gamut, I am certainly not alone in my contempt. In my workshops, I sometimes get the question, "In what situation would you recommend a pie chart?" For me, the answer is never.* There are a number of alternatives, each with their own benefits. It's these alternatives that I'll focus on in this post.

*Full disclosure: There was once a situation at Google where we wanted to share some diversity stats on gender breakdown but didn't want to show the specific values. In this case, the fact that it's tough for people to attribute accurate value to 2-dimensional space worked to our advantage and we leveraged a pie chart absent of any value labels. Though, now that Google is sharing their diversity stats publicly (I'll resist the urge to comment on the ill-chosen donut graphs they are using to do so) it seems even this has become a moot need.

The following is an example that I often use in my workshops (based on a real example, but modified a bit to preserve confidentiality). By way of context: imagine you just completed a pilot summer learning program on science aimed at improving perceptions of the field among 2nd and 3rd grade elementary children. You conducted a survey going into the program and at the end of the program and have visualized the resulting data in the following set of graphs.


I believe the above data demonstrates that, on the basis of improved sentiment towards science, the pilot program was a great success. Going into the program, the biggest segment of students (40%, the green slice in the left pie) felt just "OK" about science - perhaps they hadn't made up their minds one way or the other. Whereas after the program (pie on the right), that 40% in green shrinks down to 14%. Bored (blue) and Not great (red) went up a percentage point each, but the majority of the change was in a positive direction: after the program, nearly 70% of kids (purple + teal segments) expressed some level of interest towards science.

The above visual does this story a great disservice. Yes, you can get there, but you have to first overcome the annoyance of trying to compare slices across two pies. There's no need for this annoyance: choose a different type of visual!

Let's take a look at four alternatives using the above data.

Alternative #1: Show the Number(s) Directly
If the improvement in positive sentiment is the big thing we want to communicate, we can consider making that the only thing we communicate:
Too often, we think we have to include all of the data and overlook the simplicity and power of communicating with just one or two numbers directly, as in the above. That said, if you feel you need to show more, look to one of the following alternatives.

Alternative #2: Simple Bar Graph
When you want to compare two things, you typically want to put those two things as close together as possible and align them along a common baseline to make this comparison easy. The simple bar graph does this. This is the "after" version that I typically use in my workshops (which is why you see more narrative integrated into the following visual than the other alternatives).

Alternative #3: 100% Stacked Horizontal Bar Graph
When the part-to-whole concept is a must-have (something you don't get with either of the above solutions), the stacked 100% horizontal bar graph achieves this. Note that you get a consistent baseline to use for comparison both at the left and at the right of the graph, which can be useful in cases such as this, allowing the audience to easily compare both the negative segments at the left and the positive segments at the right across the two bars. Because of this, I find this to be a useful way to visualize survey data in general.

In the above version I chose to retain the x-axis labels rather than put data labels on the bars directly. I tend to do it this way when leveraging 100% stacked bars so that you can use the scale at the top to read either from left to right (which in this case allows us to attribute numbers to the change from Before to After on the negative end of the scale) or from right to left (to do the same for the positive end of the scale). In the simple bar graph shown previously, I chose to omit the axis and label the bars directly. This illustrates how different views of your data may lead you to different design choices. Always think about how you want your audience to use the graph and make your design choices accordingly - different choices will make sense in different situations.

Alternative #4: Slopegraph
The final alternative we'll consider today is a slopegraph (I've blogged about slopegraphs previously here, here, and here). As was the case with the simple bar chart, you don't get a clear sense of there being a whole and thus pieces-of-a-whole in this view (in the way that you do with the initial pie, or with the 100% horizontal stacked bar). Also, if it is important to have your categories ordered in a certain way, a slopegraph won't always be ideal since the various categories are placed according to the respective data values (in the following, on the right hand side, you do get the positive end of the scale at the top but note that Bored and Not great at the bottom are switched relative to how they'd appear in an ordinal scale because of the values that correspond with this points - if you need to dictate the category order, use the simple bar graph or the 100% stacked bar graph where you can control this).

One thing you do get with the slopegraph is the visual percent change from Before to After for each category via the slope of the respective line. It's easy to see quickly that the category that increased the most was Excited (and the category that decreased markedly was OK). The slopegraph also provides clear visual ordering of categories from greatest to least (via their respective points in space from top to bottom on the left and on the right sides of the graph).

Any of these alternatives might be the best choice given the specific situation, how you want your audience to interact with the information, and what point(s) of emphasis you want to make. The meta-lesson here is that you have a number of of alternatives to pies that can be more effective for getting your point across.

I should note that I had a couple specific sources of inspiration for this post. I recently completed some long overdue reading that included Jon Schwabish's An Economist's Guide to Visualizing Data. In it, Jon discusses a number of data viz best practices through examples of common mistakes and some nice makeovers, including a section focused on alternatives to pies. I highly recommend checking out this paper. Andy Kriebel recently posted a nice makeover of a particularly annoying "data visualization" that tried to combine pie graphs with faces (you have to see it to believe it). There are a few things that are worse than a pie graph: a 3D exploding pie graph, having to compare segments across two pie graphs, and - a recent (and unexpected) addition to the list - the face-pie.

The Excel workbook with the above makeovers can be downloaded here.

Are there other alternatives to pies that should be added to this list? Which one do you favor in this situation? Leave a comment with your thoughts!

Thursday, May 22, 2014

the story you want to tell...and the one your data shows

I was working on a makeover for a recent workshop when it became apparent that the story being told wasn't quite right, or at least wasn't exactly the story I would tell after looking at the data in a couple of different ways. In the following post, I'll walk you through an anonymized version of the makeovers and my corresponding thought process.

The original visual looked something like the following. It was accompanied by the headline, "Price has declined for all products on the market since the launch of Product C in 2010."


Based on the headline, what we're most interested in looking at here is the trend of cost over time for each product. The variance in colors across the bars distract from this and make the exercise more difficult than need be. Bear with me here, as we're going to go through probably more iterations of looking at this data than you might typically, but I think the progression is interesting.

For a first look, let's remove the visual obstacle of the variance in color and see what the resulting graph looks like (at the same time taking other steps to make sure things are appropriately labeled and de-clutter by removing unnecessary gridlines, tick marks, etc.):


Going back to the original headline, we're primarily interested in what has happened since Product C was launched in 2010, so let's emphasize the relevant pieces, forcing our attention there, and see what that reveals:


Upon studying this for a moment, we see clear declines in the average retail price for Product A and Product B in the time period of interest, but this doesn't appear to hold true for the products that were launched later. Plus, you've probably been thinking as you've scrolled through these bar chart iterations that we are looking at time, so perhaps a line graph would make more sense. Let's see what that looks like in the same layout as above:


If it wasn't already apparent, it probably now is with the above that it likely makes sense to graph all of the lines against the same x-axis so that we can more easily compare them to each other. This also reduces the clutter and redundancy of all of those year labels. The resulting graph might look like this:


With this view, we can much more easily see and comment on what's happening over time. Again, going back to that initial headline, I might modify it to say something like, "After the launch of Product C in 2010, the average retail price of existing products declined."


But this view also allows us to see something perhaps more interesting and noteworthy: "With the launch of a new product in this space, it is typical to see an initial average retail price increase, followed by a decline."


And perhaps we'd also want to note, "As of 2014, retail prices have converged across products, with an average retail price of $223, ranging from a low of $180 (Product C) to a high of $260 (Product A)."


Note how, with each different view of the data, you were able to more or less clearly see certain things. You can use the strategy above to highlight and tell different pieces of a nuanced story. Just make sure that the story you are telling is the same one that your data shows!

If you're interested, you can download the Excel file with the above visuals here.

Thursday, May 15, 2014

the visual displays I use most

As part of a project I'm currently working on, I recently went through all the visuals I've created in the past year - for workshops, this blog, and consulting work - and categorized them. Out of the 200+ visuals that I created, there were only a dozen different types of visuals that I used (and just 7 that, together, account for more than 90% of the total visuals I created).

I thought it might be useful to share the stats with you, along with some related blog posts (some of the posts linked below focus directly on the given type of visual display, while others simply show an example of their use).

the visual displays I use most
with % of total displays created in the past year
  1. Horizontal bar graph - 27%
  2. Line graph - 16%
  3. Horizontal stacked bar graph - 14%
  4. Vertical bar graph - 10%
  5. Simple text - 8%
  6. Vertical stacked bar graph* - 8%
  7. Slopegraph - 8%
  8. Heatmap* - 3%
  9. Area graph - 2%
  10. Waterfall chart - 2%
  11. Scatterplot* - 1%
  12. Table - 1%
*I seem to be lacking posts with examples of these types of visuals: 
I'll add over time and link here once the posts are live.

This is certainly not an exhaustive list of types of visual displays of information. But in my experience, just a handful of different types of visuals will meet the majority of your everyday storytelling with data needs.

Wednesday, May 14, 2014

500,351

That's the number that just caught my eye when I was looking up a past post on my blog a moment ago. If you look over to the left, you'll see it, too. Well, not exactly, as it's likely continued to tick up a little since I began writing this post. It reflects the number of views this blog has had since I launched it in December 2010. Just two words come to mind when I see it: thank you.

Thank you for reading, for your comments and your general interest in storytelling with data.

This seems like a good (albeit random) point to pause for a moment, to link back to some popular posts you may have missed and ask you to weigh in on what you'd like to see here in the future.

popular posts in case you missed them
The 10 most popular posts (based on number of page views) are listed below (plus a bonus 11th that has 10 tips and links to related posts).
  1. no more excuses for bad simple charts: here's a template
  2. how to do it in Excel
  3. the waterfall chart
  4. strategies for avoiding the spaghetti graph
  5. a Google example: preattentive attributes
  6. my penchant for horizontal bar graphs
  7. chart chooser
  8. the power of simple text
  9. logic in order
  10. slopegraph template
  11. celebrating (almost) 100 posts with 10 tips
what would you like to see covered here in the future?
Suggestions on future topics, questions you'd like me to opine on, or data visualization challenges you're facing are welcome. Leave a comment with your thoughts or email me directly at cole.nussbaumer@gmail.com.

where should I go in 2015?
I'm also looking ahead to my 2015 public workshop schedule. If you'd like to recommend a city or location, you can do so here.

Thank you very much for reading!

Thursday, May 8, 2014

calling all Bay Area data viz gurus

Facebook recently announced their 2nd bi-annual Viz Cup, bringing together the finest Bay Area vizzers for a night of competition and fun! I'm excited and honored to be one of the judges at the event, which will take place the evening of May 20th at Facebook headquarters in Palo Alto, CA.

Click here for additional info and to RSVP.

If you're interested in what I'll be looking for when it comes to effective data viz, check out this post, recapping the last event. I hope to see you there!

Monday, April 28, 2014

why I disdain most infographics

Too many offenses to sensible data visualization to list. It's unfortunate, too, because there are some compelling stats lost in the cartoony graphics.

Gates Foundation Inventions
Source: MPHOnline.org

Wednesday, April 23, 2014

focusing with color

In my previous post, I discussed the distinction between exploratory and explanatory analysis and showed how you can sometimes leverage the same visual when moving from the former stage to the latter, with some minor tweaks. Today, I'd like to consider another example of this and also illustrate how you can use iterations of the same visual to focus your audience with color.

We'll continue with the imagined scenario where you work for a car manufacturer. Today, you're interested in understanding and sharing insight around top design concerns for a particular make and model. Your initial visual might look something like the following:


The above visual could be one of those you create during the exploratory phase: when you're looking at the data to understand what might be interesting or noteworthy to communicate to someone else. The above shows us that there are 10 design concerns that have 8+ concerns greater than 1,000 (the rest of the tail has been chopped off, which would probably be worth a footnote with a little detail on how long the tail is, perhaps how many design concerns there are in total, etc. if you're using this to communicate to others).

You can leverage the same visual, together with thoughtful use of color and text to further focus the story:


Continuing to peel back the onion, we can go a level further than this, again using the same visual with modified focus and text to lead our audience from the macro to the micro parts of the story:


Repeated iterations of the same visual, with different pieces emphasized to tell different stories or different aspects of the same story (as above) can be particularly useful in live presentations, because you can orient your audience with your data and visual once and then continue to leverage it in the manner illustrated above.

If you're interested, you can download the Excel file with the above visuals here.