storytelling with data: Visual Makeover

Showing posts with label Visual Makeover. Show all posts

Friday, July 24, 2015

align against a common baseline

Registration for upcoming storytelling with data public workshops in NYC and Los Angeles is currently open here. Stay tuned for details on fall sessions to be scheduled in Seattle and SF.

I've been failing when it comes to staying up with reading and posting on data visualization related stuff lately (my focus has been elsewhere). But I found myself with a few spare minutes yesterday afternoon and decided it was time to change that.

The first article in my Feedly was by FiveThirtyEight and the graph that appeared with it caused me to click for details. Here's the graph that caught my attention:

I like FiveThirtyEight's general approach when it comes to data visualization: straightforward and clutter free, with emphasis on the story. My view is that the graph should never be what makes the data interesting, rather it's the story that makes the data interesting. They seem to subscribe to this view as well.

In this case, the story is called out clearly at the top: Being Arrested Is Deadlier For African-Americans.

The accompanying visual is fine. But I think it can be made better by adhering to one recommendation I find myself often voicing to workshop participants: Think about what you want your audience to be able to easily compare. Put those things as physically close together as you can and align them along a common baseline.

With the current view, it's easiest to compare deaths for Whites by cause and, separately, deaths for African-Americans by cause. Yes, we can see (and read) that the yellow bars on the right are bigger than the red bars on the left (the point called out in the title), but note the bouncing back and forth your eyes do when comparing the bars across the two graphs. It's also hard to judge how much longer the yellow bars are vs. the red ones. Sure, we have the numbers there to help, but this means we have to do some mental math to decipher the differences. Why go through this work, when we can restructure the visual to avoid it?

To make it easier to compare deaths by cause for African-Americans vs. Whites, we can align both series along a common baseline. Here's what that looks like:

I made a few additional minor changes in this remake. The original graphs weren't monotonically decreasing in order of either White or African-American death cases (not sure why), so I changed the ordering of the data here so it would be, ordering by decreasing cause of death for African-Americans (there should always be logic in the way you order your data). Where there was space, I pulled the data labels into the bars to reduce the visual clutter. I pulled the subtitle instead into the x-axis label so that the words are right next to the data they describe. I didn't like the bold colors in the original visual, so stripped color out of my remake entirely. (If you do want to use color here, I'd suggest different shades of the same color - red and yellow together are both so bright that it makes it hard to focus on one or the other).

Another potential alternative with this data would be to use a slopegraph. Or so I thought. But I quickly abandoned this approach: there are too many criss-crossing lines at the lower values to allow space to label the data effectively. The following is what it looked like (note I didn't spend any time on the formatting or labeling once I realized this approach wouldn't work; if you're interested in seeing a completed example of a slopegraph in practice, check this out).

Also, while I love the idea of slopegraphs for group comparisons, in practice I've had mixed responses. Slopegraphs can be a little less intuitive than bars for data like this. It's also important to note that the slopegraph makes it easier to focus on the difference (via the lines connecting the various points), whereas bars make it easier to focus on absolute values. In this case, even if the data values had been such that the slopegraph would have worked, I think I still would prefer the bars when it comes to supporting the story that overall deaths per 100,000 arrests are higher for African-Americans compared to Whites.

Meta-point: align the things you want your audience to compare along a common baseline!

To download the Excel file containing the graphs above, click here.

Tuesday, August 26, 2014

design with audience in mind

Recently, my husband shared a USA Today graphic with me that summarizes diversity stats across a number of Bay Area tech companies. Surely, this would be a good blog topic, he told me. He knows me well. Here is a screenshot of the visual:

Online version can be found here.

First, let me mention how cool I think it is that companies like Google have started sharing their diversity stats. I expect that with this transparency, we'll see movement towards more diverse workforces over time.

Next, let me discuss what an annoying user experience it is to try to look at the diversity data with USA Today's visual. It shows the breakdown for the given company (Apple, in the above screenshot) by gender on the left and ethnicity on the right. The various tech companies each have their own tab; you can toggle between companies using the numbered tabs along the left (not sure what the numbers on the tabs mean...if anything).

What is the first thing you want to do with this data?

For me, the stats for a given company, on their own, are not so interesting. It's by comparing them to the other companies that we help build context for what is good (or if not good, then at least better), what is worse, and so on. In other words, the single thing I want to do most is compare the stats across companies. The way this visual is organized makes this a lot harder than necessary. If I want to compare the proportion who are women at Apple (for example) to other companies, first I look to the Apple tab and commit 30% to memory, then I click through the other tabs one by one to try to put that 30% into context. This is annoying, but possible.

It gets more annoying and difficult if you try to do it by ethnicity. Try comparing the proportion Hispanics make up of the various workforces, for example. It's further complicated by the fact that the slices on the pie move and the order in which the companies are listed changes as you toggle between companies.

This is not an ideal user experience. My guess is that there was some desire to make the visual "interactive," which it sort of feigns via the tabs of various companies along the left. But really all this does is allow you to see the various static graphs, one at a time. Why not replace with a single static visual that makes the task your audience is going to want to do easy?

In other words, let's design the visual with our audience - and how they are going to want to interact with the data - in mind. If the goal is to compare across companies, I might do something like the following:

(Note that the title and takeaway at the top were preserved from USA Today's visual; I'm not sure I would have been quite as negative.)

The above version allows me to see things that were very difficult to get to with the original. eBay is doing the best from a gender diversity standpoint, but worse when it comes to racial diversity, where Yahoo is doing better than the others, etc.

Bottom line: design with your audience in mind!

Click here to download the Excel file with the above visual.

Wednesday, June 4, 2014

alternatives to pies

My disdain for pie charts is well documented. While opinions on their usefulness run the gamut, I am certainly not alone in my contempt. In my workshops, I sometimes get the question, "In what situation would you recommend a pie chart?" For me, the answer is never.* There are a number of alternatives, each with their own benefits. It's these alternatives that I'll focus on in this post.

*Full disclosure: There was once a situation at Google where we wanted to share some diversity stats on gender breakdown but didn't want to show the specific values. In this case, the fact that it's tough for people to attribute accurate value to 2-dimensional space worked to our advantage and we leveraged a pie chart absent of any value labels. Though, now that Google is sharing their diversity stats publicly (I'll resist the urge to comment on the ill-chosen donut graphs they are using to do so) it seems even this has become a moot need.

The following is an example that I often use in my workshops (based on a real example, but modified a bit to preserve confidentiality). By way of context: imagine you just completed a pilot summer learning program on science aimed at improving perceptions of the field among 2nd and 3rd grade elementary children. You conducted a survey going into the program and at the end of the program and have visualized the resulting data in the following set of graphs.

I believe the above data demonstrates that, on the basis of improved sentiment towards science, the pilot program was a great success. Going into the program, the biggest segment of students (40%, the green slice in the left pie) felt just "OK" about science - perhaps they hadn't made up their minds one way or the other. Whereas after the program (pie on the right), that 40% in green shrinks down to 14%. Bored (blue) and Not great (red) went up a percentage point each, but the majority of the change was in a positive direction: after the program, nearly 70% of kids (purple + teal segments) expressed some level of interest towards science.

The above visual does this story a great disservice. Yes, you can get there, but you have to first overcome the annoyance of trying to compare slices across two pies. There's no need for this annoyance: choose a different type of visual!

Let's take a look at four alternatives using the above data.

Alternative #1: Show the Number(s) Directly
If the improvement in positive sentiment is the big thing we want to communicate, we can consider making that the only thing we communicate:

Too often, we think we have to include all of the data and overlook the simplicity and power of communicating with just one or two numbers directly, as in the above. That said, if you feel you need to show more, look to one of the following alternatives.

Alternative #2: Simple Bar Graph
When you want to compare two things, you typically want to put those two things as close together as possible and align them along a common baseline to make this comparison easy. The simple bar graph does this. This is the "after" version that I typically use in my workshops (which is why you see more narrative integrated into the following visual than the other alternatives).

Alternative #3: 100% Stacked Horizontal Bar Graph
When the part-to-whole concept is a must-have (something you don't get with either of the above solutions), the stacked 100% horizontal bar graph achieves this. Note that you get a consistent baseline to use for comparison both at the left and at the right of the graph, which can be useful in cases such as this, allowing the audience to easily compare both the negative segments at the left and the positive segments at the right across the two bars. Because of this, I find this to be a useful way to visualize survey data in general.

In the above version I chose to retain the x-axis labels rather than put data labels on the bars directly. I tend to do it this way when leveraging 100% stacked bars so that you can use the scale at the top to read either from left to right (which in this case allows us to attribute numbers to the change from Before to After on the negative end of the scale) or from right to left (to do the same for the positive end of the scale). In the simple bar graph shown previously, I chose to omit the axis and label the bars directly. This illustrates how different views of your data may lead you to different design choices. Always think about how you want your audience to use the graph and make your design choices accordingly - different choices will make sense in different situations.

Alternative #4: Slopegraph
The final alternative we'll consider today is a slopegraph (I've blogged about slopegraphs previously here, here, and here). As was the case with the simple bar chart, you don't get a clear sense of there being a whole and thus pieces-of-a-whole in this view (in the way that you do with the initial pie, or with the 100% horizontal stacked bar). Also, if it is important to have your categories ordered in a certain way, a slopegraph won't always be ideal since the various categories are placed according to the respective data values (in the following, on the right hand side, you do get the positive end of the scale at the top but note that Bored and Not great at the bottom are switched relative to how they'd appear in an ordinal scale because of the values that correspond with this points - if you need to dictate the category order, use the simple bar graph or the 100% stacked bar graph where you can control this).

One thing you do get with the slopegraph is the visual percent change from Before to After for each category via the slope of the respective line. It's easy to see quickly that the category that increased the most was Excited (and the category that decreased markedly was OK). The slopegraph also provides clear visual ordering of categories from greatest to least (via their respective points in space from top to bottom on the left and on the right sides of the graph).

Any of these alternatives might be the best choice given the specific situation, how you want your audience to interact with the information, and what point(s) of emphasis you want to make. The meta-lesson here is that you have a number of of alternatives to pies that can be more effective for getting your point across.

I should note that I had a couple specific sources of inspiration for this post. I recently completed some long overdue reading that included Jon Schwabish's An Economist's Guide to Visualizing Data. In it, Jon discusses a number of data viz best practices through examples of common mistakes and some nice makeovers, including a section focused on alternatives to pies. I highly recommend checking out this paper. Andy Kriebel recently posted a nice makeover of a particularly annoying "data visualization" that tried to combine pie graphs with faces (you have to see it to believe it). There are a few things that are worse than a pie graph: a 3D exploding pie graph, having to compare segments across two pie graphs, and - a recent (and unexpected) addition to the list - the face-pie.

The Excel workbook with the above makeovers can be downloaded here.

Are there other alternatives to pies that should be added to this list? Which one do you favor in this situation? Leave a comment with your thoughts!

Thursday, May 22, 2014

the story you want to tell...and the one your data shows

I was working on a makeover for a recent workshop when it became apparent that the story being told wasn't quite right, or at least wasn't exactly the story I would tell after looking at the data in a couple of different ways. In the following post, I'll walk you through an anonymized version of the makeovers and my corresponding thought process.

The original visual looked something like the following. It was accompanied by the headline, "Price has declined for all products on the market since the launch of Product C in 2010."

Based on the headline, what we're most interested in looking at here is the trend of cost over time for each product. The variance in colors across the bars distract from this and make the exercise more difficult than need be. Bear with me here, as we're going to go through probably more iterations of looking at this data than you might typically, but I think the progression is interesting.

For a first look, let's remove the visual obstacle of the variance in color and see what the resulting graph looks like (at the same time taking other steps to make sure things are appropriately labeled and de-clutter by removing unnecessary gridlines, tick marks, etc.):

Going back to the original headline, we're primarily interested in what has happened since Product C was launched in 2010, so let's emphasize the relevant pieces, forcing our attention there, and see what that reveals:

Upon studying this for a moment, we see clear declines in the average retail price for Product A and Product B in the time period of interest, but this doesn't appear to hold true for the products that were launched later. Plus, you've probably been thinking as you've scrolled through these bar chart iterations that we are looking at time, so perhaps a line graph would make more sense. Let's see what that looks like in the same layout as above:

If it wasn't already apparent, it probably now is with the above that it likely makes sense to graph all of the lines against the same x-axis so that we can more easily compare them to each other. This also reduces the clutter and redundancy of all of those year labels. The resulting graph might look like this:

With this view, we can much more easily see and comment on what's happening over time. Again, going back to that initial headline, I might modify it to say something like, "After the launch of Product C in 2010, the average retail price of existing products declined."

But this view also allows us to see something perhaps more interesting and noteworthy: "With the launch of a new product in this space, it is typical to see an initial average retail price increase, followed by a decline."

And perhaps we'd also want to note, "As of 2014, retail prices have converged across products, with an average retail price of $223, ranging from a low of $180 (Product C) to a high of $260 (Product A)."

Note how, with each different view of the data, you were able to more or less clearly see certain things. You can use the strategy above to highlight and tell different pieces of a nuanced story. Just make sure that the story you are telling is the same one that your data shows!

If you're interested, you can download the Excel file with the above visuals here.

Monday, April 14, 2014

exploratory vs explanatory analysis

I often draw a distinction between exploratory and explanatory data analysis. Exploratory analysis is what you do to get familiar with the data. You may start out with a hypothesis or question, or you may just really be delving into the data to determine what might be interesting about it. Exploratory analysis is the process of turning over 100 rocks to find perhaps 1 or 2 precious gemstones. Explanatory analysis is what happens when you have something specific you want to show an audience - probably about those 1 or 2 precious gemstones. In my blogging and writing, I tend to focus mostly on this latter piece, explanatory analysis, when you've already gone through the exploratory analysis and from this have determined something specific you want to communicate to a given audience: in other words, when you want to tell a story with data.

Keeping this distinction in mind, I thought it might be interesting to look at a recent makeover and show how the visual you could use for the exploratory and explanatory steps of the analytical process might differ.

For this (generalized & simplified*) example, imagine that you work for a car manufacturer. You're looking at customer feedback, specifically to better understand how failed or less-than-ideal performance across various dimensions for a given make and model impacts customer satisfaction. The primary output variable you're looking at in this case is an overall question in your customer satisfaction survey, where customers are asked to express their overall satisfaction with their car along a 5-point Likert scale (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied). Let's assume you're most interested in anyone responding with anything other than Very Satisfied, and want to understand how this varies by customers who have reported specific issue(s) with their car, by the type of issue.

*Please keep in mind that I'm making up the specific scenario here; the makeover is a generalized example from a past workshop where I don't have all of the details and also don't have other data that would possibly be of benefit in the exploratory and explanatory phases. For example, there are likely other things that drive the overall satisfaction with the car, which we're ignoring here. Also, anytime you show percents like this, I'd recommend also showing the N count - in this case, the number of people reporting the given issue - which will be helpful for the interpretation of the data.

Your initial visual might look something like the following:

In the above, I've grouped all of the "less than very satisfied" responses (in orange), with the data arranged in descending order of this metric. With this visual, you can scan through the various issues and see the relevant "less than very satisfied" metric. This might be useful for part of your exploratory analysis.

Once you've identified something or some things to focus on, in some cases it will make sense to create a different visual for the purpose of focusing on that thing or those things. Alternatively, the same visual and be modified for explanatory purposes by drawing attention to the points of interest, while preserving the other data for context:

We can use the same visual and approach for highlighting another potential point of interest:

Or another:

Note how, when we focus on one aspect or story, it's actually harder to see the others. That's one of the reasons it's important to do exploratory analysis before you get to the explanatory phase: so you can have confidence that you're focusing your audience on the right thing(s).

In case it's of interest, the Excel workbook with the above graphs can be downloaded here.

Wednesday, April 2, 2014

US prison population revisualized

The following graph caught my eye recently in my Twitter feed:

Source: https://plot.ly/~Dreamshot/361/

I've been debating whether to post about it (and finally decided that I couldn't resist).

I don't want to rip it apart.

Well, that's not entirely true.

I do want to rip it apart, but it's not in an effort to be mean. The above visual breaks pretty much every best practice out there when it comes to effective graph design. It's simple data. Probably not so much is being lost in terms of being able to interpret the data through this less-than-stellar data viz. But the specifics of the design choices (or lack thereof) drive me batty. To the extent that I can't help but comment and resolve to show what it has the potential to be.

First, let me list the main components that get under my skin (and I should note that it's possible some of these are constraints in the Plot.ly tool through which the above visual was published, which I have not used directly) :

No meaningful ordering to the data (rather, the categories are shown in reverse alphabetical order... not so helpful);
Lack of axis labels (sure, we can infer, but why should we have to?);
Diagonal text on x-axis (avoid, avoid, avoid!); and
Grey background, white vertical gridlines, and black bar outlines add unnecessary clutter (eliminate!).

I think the only positives I have to say about the original visual are: 1) a horizontal bar chart is a good choice here because we're dealing with categorical data with long category names, 2) good descriptive graph title, and 3) it makes me happy to see the data source listed (both as general good graph hygiene, as well as because it allows me to get to the source data to remake the visual).

Speaking of remaking the visual, here's what it could look like when we tackle the above issues:

If we just want to show the data, we could proceed with the above. Taking a cue from the original visual - a single point is labeled with its corresponding value: Drug Offenses - perhaps there is a story here worth highlighting. If that's the case, our visual might look something like the following:

Meta-lesson: if you're going to go through the effort of visualizing data, take the time to be thoughtful about your design choices!

If you're interested in the Excel version of the above makeovers, you can download it here.

Monday, March 24, 2014

color considerations with a dark background

I was working on some data visualization makeovers a few weeks ago and found myself facing a challenge I hadn't previously encountered: the need to leverage a dark background.

When it comes to slides that communicate data, I don't typically recommend anything other than a white background. Anything else makes me think of Tufte's conversation on data-ink ratio. His basic idea is that you should work to maximize this figure (more data and less ink, vs. the opposite). In The Visual Display of Quantitative Information, he says, "Every bit of ink on a graphic requires a reason. And nearly always that reason should be that the ink presents new information." If we think about a colored or dark slide background from the perspective of the data-ink ratio, that's a whole lot of ink for no data at all.

Nancy Duarte more directly discusses dark backgrounds in Slide:ology, listing the following considerations:

Dark background: formal, doesn't influence ambient lighting, doesn't work well for handouts, fewer opportunities for shadows (Cole's input: I don't think this is a bad thing!), for large venues, objects can glow.
White background: informal, has a bright feeling, illuminates the room, works well for handouts, for smaller venues, no opportunity for dramatic lighting or spotlights on the elements (Cole's input: as in the "fewer opportunities for shadows," I think the lack of opportunity for "dramatic lighting", though phrased as a negative is actually probably a good thing).

Let's take a look at what a simple graph looks like on a white, blue, and black background:

The blue and black backgrounds just feel heavier to me. They make my eyes almost pulsate a bit (that's probably the glow that Duarte referred to). That, plus Tufte's data-ink ratio and Duarte's considerations together seem to indicate that one should generally opt for a white background. That said, sometimes there are considerations outside of the ideal scenario for communicating with data that must be taken into account, such as your company's (or client's) brand and corresponding standard template. Such was the case in my specific situation.

I didn't recognize this immediately. Rather, it was only after I had completed (I thought) my revamp of the original visual, that I realized it just didn't seem to fit with the look and feel of the work products I'd seen from the client group in general. Their template was sort of bold and in your face with a mottled, black background spiked with bright, heavily saturated colors. In comparison, my visual felt sort of...meek. Here's a genericized version of my initial makeover:

To solve for this, I remade my own makeover leveraging the same dark background I'd seen used in some of the other examples shared with me. I had to sort of flip around some of my normal thought process. With a white background, the further a color is from white, the more it will stand out (so grey stands out less, black stands out very much). With a black background, the same is true, but black becomes the baseline (so grey stands out less, and white stands out very much). I also realized some colors that are typically verboten with a white background (for example yellow) are incredibly attention grabbing against black (I didn't use yellow in this particular example, but did in some others).

The same goal of identifying and eliminating clutter (elements that aren't adding informative value) still hold. In fact, reducing clutter becomes even more important on a dark background, because you're already dealing with the high ink to data ratio that we previously touched upon. So less already looks like more than it would on a white background. But it can be done.

Here's what my "more in line with the client's brand" version of the visual looked like:

What do you think - are black or colored background out when it comes to communicating with data, or can it work? What other considerations should we make when working with non-white backgrounds? What other scenarios might lead us to want to choose a dark background? Leave a comment with your thoughts.

Wednesday, February 12, 2014

more Americans are tying the knot

The Pew Research Center reports on some fascinating data. But I tend to be underwhelmed with the way they illustrate this data visually. The graphs aren't horrible. They look nice. They are well-labeled and on topic when it comes to the stories and reports in which they are found. But they still get under my skin. Because in many cases, some relatively minor modifications would transform the graphs from "not horrible" to great.

The following graph caught my eye as I was scrolling through my Twitter feed last week:

Take a moment to study this graph. What information does it reveal? What data points do you focus on? What comparisons does it enable you to make?

It's not a horrible graph. But it could be so much better. This prompted me to take a look at the full article in which this graph was contained. I had the same reaction to every visual display of data that was included. In all cases, the data in the graphs can help add visual evidence to the story that is being told (and Pew Research gets high marks from me when it comes to clearly articulating a story), but the graphs aren't structured in a way that facilitates that as well as they could.

By choosing the right type of graph and being more strategic with color, we can transform these graphs from not horrible to great. Let's take a look at each, in context of the stories that they are meant to help tell.

Story & Visual #1: Newly Married Adults

The new data show that 4.32 million adults (ages 18 or older) were newlywed in 2012, a 3% increase over the 4.21 million adults married in 2011.

Here's a quick overview of the changes I made:

Shift from bars to a line graph: Yes, you can show time in a bar chart, but they don't tend to allow the audience to see trends as easily as the connected points in a line graph do. Also, years ordered descending downward isn't as intuitive as increasing years from left to right.
Use color more strategically: Don't use color just to use color. Rather, use it to draw your audience's eye to where you want them to look. In this case, if the point we're making is about 2012, let's use color there (and only there) to help reinforce the story that we want to tell (in this case we could have possibly even made the last line segment between 2011 and 2012 that same shade of green, it's that slope that shows the 3% increase referenced in the article; we'll look at another example using this approach momentarily).
Related thought - decimal places: I originally wanted to reduce the number of decimals to one, but that leaves points that don't appear to be the same labeled the same, which can be confusing (for example, the 2011 point in the line graph appears slightly lower than 2010, but if we reduce to a single decimal point, both data labels would be 4.2). If the values look different, make sure the data labels are set to a format that doesn't appear to contradict this.
Related thought - axis range: The rule is that bar charts must have a zero-baseline because of the way our eyes compare the endpoints (I'm not positive that this was the case in the original). With line graphs, you can get away with the minimum value on your y-axis being something other than zero, but you have to be cautious about over-zooming and making relatively small changes appear more significant than they are. In fact, when I first plotted this data in Excel, the program automatically zoomed way in:

Don't let your graphing application pick your axis range!

Story & Visual #2: New Marriage by Education

Almost the entire increase in new marriages from 2011 to 2012 is accounted for by the college educated.

This is the graph that originally caught my eye in my Twitter feed. In this case, the comparison we want the reader to make is between the Bachelor's degree or more series and the other series over time. We want to draw special emphasis to the increase over time for this group from 2011 to 2012, to help make the point that this group accounted for nearly all of the overall increase in new marriages.

The original chart isn't constructed in a way that makes this easy. Again, I'd recommend a line graph. In this case, the data works well (lines aren't overlapping, creating a spaghetti graph) and it's easier to compare the relative heights of the lines when they are all oriented against the same yearly x-axis (rather than repeat the years for each category, as was done in the original graph). Since the main point is about the Bachelor's degree or more series, we can call the reader's attention there through use of color. We can emphasize the 2011 to 2012 increase by using a darker shade of the same color. I rounded the figures, as decimal places weren't needed here (and can actually result in a false sense of precision, since I believe these figures are based on a survey sample, so not the entire US population).

Story & Visual #3: New Marriage by Age

The prime age for getting hitched is 25 to 34.

A similar approach can be taken for the third visual in the article, which was designed the same as the second visual, but focused on marriage rate by age. Typically, I would suggest leveraging the natural ordering of the categories (keeping the age groups in order from lowest to highest, as was done in the original), however in this case I think we can break that guideline and still have a chart that's easy to read because of the clear labeling of the various series. Again, this design (line chart, aligned by common x-axis, using color to highlight the series of interest) allows the reader to make the comparison we want - between 25-34 year olds and other ages - more easily than the original. Again, I rounded the figures shown in the data labels.

Note in this case, given the story (the prime age for getting hitched is 25-34 years), we could have potentially reduced the data shown to just the 2012 figures (perhaps in this case using a horizontal bar chart to compare across the various age groups - with that approach, I'd suggest keeping the age groups in numerical order). There are some benefits to retaining the historical context, however. First, it helps to put the 2012 figures into perspective. We also leverage the fact that our audience is familiar with this chart design (and how to read it), since we used the same approach previously. Whether to limit the data to only the pieces that directly support the story or showing additional context is always a question to debate when determining what to show (and the answer will change depending on the situation).

Story & Visual #4: Staying Married

It is one thing to get married, it is another thing to stay married. In spite of the recent uptick in newlyweds since 2011, it is still the case that fewer adults were currently married in 2012 (50.5%) than in 2011 (50.8%). The share of adults presently married peaked around 72% in 1960.

It's probably no surprise that I stuck with the pattern of transforming a bar chart into a line chart here. My biggest issue with the original visual in this case isn't the chart type (though from a clutter/cognitive load standpoint, the single line is much cleaner than the multiple bars), but rather the discrepancy in time over the x-axis. In the bar graph, we start off in decades - 1920, 1930, and so on. Until the year 2000. After that, we jump to 2006. And then the figures are reported annually from 2006 through 2012. But all the bars appear visually the same, width- and spacing-wise. This is a big no-no.

In the remake on the right, I've plotted the decade figures through 2010 and connected them with a line graph. Then I separately (on another graph that's layered over the first - this is a true example of brute-force-Excel) plotted only the actual dates for which there were values (on a scale that started off 1920, 1921, 1922, etc.), including the annual data points leading up to 2012. I colored only the points of interest - leading up to and 2012 to reinforce that the percentage currently married is at an all-time low, and the peak that happened way back in 1960.

The meta-point here is: if there is a specific story you want to tell, don't simply show relevant data, but rather display it in a way that makes it clear to your audience where to look for the evidence of the story you're telling. Choose a graph type that enables your audience to easily make the comparisons you want them to. Use color strategically to draw their eye to where you want them to focus their attention.

For those who are interested, the Excel file containing the above makeovers can be downloaded here.

Monday, January 27, 2014

HelpMeViz

We've all created a graph before and thought: Does this work? My advice when this situation arises is to seek feedback. Find a colleague or friend and show them your visual; have them talk out loud about what they see, where they pay attention, what questions they have. Their comments will help you understand whether the visual you've created is doing what you hoped it would, or in the case where it isn't, provide insight into where to concentrate your iterations.

A screenshot of the landing page at HelpMeViz.

Jon Schwabish has brought this critical feedback loop online, with his recently launched site, HelpMeViz, which was designed to "facilitate discussion, debate, and collaboration from the data visualization community." Anyone can post a visual they'd like feedback on, or an idea to help others who have submitted content (and note that anyone really means anyone: the site is not intended exclusively for data viz experts, but rather anyone who wants to receive or provide feedback). In addition to work-in-progress, the project is also open to published projects, so if you've completed work and want to see a sort of post-hoc on how others might have approached it, HelpMeViz can facilitate that as well.

I love the concept, and it's great to click through the submissions that have been posted so far, as well as read the dialogue they have inspired through reader comments. In some cases, HelpMeViz is also prompting full makeovers, such as the one published on Peltier Tech Blog this morning (link).

Congrats to my friend Jon Schwabish for providing this platform (and thank you for the many hours you've devoted to get it to where it is!). I'm excited to continue to watch it grow and read and participate in the dialogue.

To you, Reader, HelpMeViz is an incredible resource that I hope you will leverage!

Wednesday, January 15, 2014

multifaceted data and story

Registration for the upcoming workshop in Seattle is now open! Details and registration for that, plus upcoming sessions in Boston, DC, and San Francisco can be found here.

Last weekend, I ran workshops for two kdmcBerkeley 1-day sessions on Data Storytelling: Tools and Techniques for professionals working in the public health domain in California. To illustrate the concepts we covered, I used an example based on data from kidsdata.org that showed the percent of 7th graders meeting state fitness standards by race over time.

This is a rich dataset in terms of the number of facets one could focus on and the number of stories one could use it to illustrate. We looked at a number of different potential stories, and how you can change how the audience views the data and what they pay attention to through what you emphasize (and deemphasize). I thought these techniques might be of general interest, so will share them with you here. (The full Excel workbook is downloadable via the link at the end of this post.)

Here is what the data looks like:

As a first step, if we simply plot the above data as a line chart in Excel, we get the following:

I've said this before: the "insert chart" step in your graphing application should be the very first step in your data visualization process (not your last!). We focused on the above in a discussion on clutter: identifying elements that aren't adding informative value and getting rid of them. In this case, we can do things like: eliminate chart border, gridlines, and series markers, drop the trailing zero from the y-axis labels, and reduce the number of x-axis labels so the text will fit horizontally. We also decided the Multiracial line was more distracting than informative, with only 2 data points, and that it wasn't critical to the story we wanted to tell, so we removed it. We reduced the work of going back and forth between the legend at the right and the data it describes by labeling the data series directly. We removed Excel's random color choices (another Cole adage: never let your graphing application choose your colors for you!). After all of that, you end up with something like this:

The next step is to figure out where we want to draw our audience's attention. As I mentioned, there are a lot of different things we could focus on and stories we could tell with this data. Let's look at a few.

We could draw attention to the Pacific Islander group. If we look at 2012 vs. 2002, there hasn't been much change. In the early 2000's, there was some improvement, but then this fell. As of 2012, Pacific Island 7th graders in California have fitness levels lower than every other race:

Or, we could focus on the gap: American Indian, African American, Hispanic/Latino, and Pacific Island 7th graders in the state of California have markedly lower fitness levels in 2012 than their Asian American, White, and Filipino classmates:

We could draw emphasis to the change over the past decade: from our beginning point in 2002 to the latest data in 2012. We see a general up-to-the-right trend. Which is a good thing. Right?

Except that, if we focus in on the past two years (since 2010), we see a declining fitness trend across every race:

If we step back and think about context: these numbers are all low! In fact, across the board, less than 50% of California 7th graders are meeting fitness standards:

And 50% is not the maximum. If we actually think about (and show) the opportunity of where the numbers could be, we see something like the following.

This isn't to say any of the above specific emphasis or stories are right or wrong or better or worse. It depends on context: who are you communicating to and what do you need them to know or do? Use the answers to these questions to determine what data to show and how to show it (without misleading). Note also how, when we emphasize one story, it actually makes it harder to see the others. This is something to be careful of, especially when you're in the exploratory analysis phase - you don't want this to lead you to inadvertently miss something important.

In this particular case, we talked about a (contrived) situation where we were working for a California non-profit on a new marketing campaign aimed at parents to encourage them to promote more physical activity for their children. We assumed also that the 7th grade data broken down by race the best data that we had available, recognizing that the ideal dataset doesn't always exist, or isn't always accessible, so trying to work with what we had.

Here's what the final version looked like:

If you're interested, the Excel file containing all of the above visuals (as well as the step-by-step decluttering that I summarized above) can be downloaded here.

Thursday, November 7, 2013

student makeovers

This fall, I had the pleasure of teaching Intro to Information Visualization for MICA's MPS in Information Visualization. It was a 4-week course, where we explored some fundamentals of data visualization and storytelling as it relates to communicating effectively with data.

The course was unique from my typical workshops in a number of ways. It was great to get to start to know the students during our time together. Perhaps the most exciting difference for me was being able to see the lessons we covered put to use in homework assignments.

One of the assignments was a visual makeover, where students were asked to select a less-than-stellar visualization from the media, identify the underlying story and create a new and improved visual using data together with narrative to tell an effective visual story. I had a great time reviewing the before-and-afters. I thought I'd share this fun with you by posting some of them here (with my students' permission; I realize the snippets below are a little small - and my process for getting images onto my blog has started to create a sort of strange grey background, so if you want to see bigger non-grey-background versions, you can download the PDF here). Enjoy!

Makeover 1: bird feeder location by Kevin Ripka | kevinripka.com

Makeover 2: youth programs by Brittney Younger

Makeover 3: NYC refuse by Marianne Siblini

Makeover 4: BB Finale by John Breakey | www.johnbreakey.com

Makeover 5: climate change by Jennifer A. Stark

Makeover 6: prezi growth by Jess Mireau | www.jam-i-am.com

Big thanks to the students above for agreeing to let me post their work, and to the overall class for making my first time teaching at the graduate level an incredibly rewarding experience!

EXPLORE THE SITE