Thursday, February 20, 2014

a little math on non-zero baselines

I had a friendly exchange with a blog reader over the past week related to my recent post highlighting some Pew Research makeovers. In this post, I made some comments regarding the use of a non-zero baseline: specifically, that it's not ok to have a non-zero baseline with a bar chart (see related blog post), but that you can get away with it in a line graph. The question was regarding how shifting to a non-zero baseline impacts the slopes of the lines in a line graph.

That's a very good question.

One I hadn't given any thought to before.

But now that I was thinking about it, I worried I'd been recommending something incorrect.

This must have really been weighing on me, because the night after I received the initial email asking about the impact on slopes, I had a dream where I was doing the math to show that it's actually ok. Next challenge: to see whether I could replicate my proof in reality. The short answer is yes.

Rescaling from a y-axis that begins at zero to one that does not begin at zero actually doesn't impact the slope of the lines. To demonstrate this, I sketched out an example, leveraging some lessons learned back in 7th grade algebra:

On the left hand side, I've plotted a line that connects the data points 6, 7, and 19 (to which I've given x-coordinates of 1, 2, and 3, respectively). In this initial version, the y-axis ranges from a minimum of zero to a max of twenty. The calculations for the slopes of the two lines that connect these points is shown below the graph.

On the right hand side, I've plotted the same points on a scale from 5 to 20, reducing the y-axis coordinates by 5 each to reflect this change in scale. Below the graph, we see the math for the slopes of the two lines connecting these points. The slope of each line is the same as it was initially. In other words, using a non-zero baseline does not impact the slope of the lines in a line graph.

Just to be sure I didn't inadvertently pick lucky numbers in this example, I did a second example with points (5, 12), (10, 17), and (20, 42) on a full y-axis scale from 0 to 50, and then one from 5 to 50 (again, reducing the y-coordinates appropriately to reflect this rescaling). I found the same thing: the slopes of the lines remain the same between the graph with the zero baseline and the one that's been rescaled. When I thought about this some more later, it seemed obvious - of course the slope of the lines doesn't change, because I'm not changing the points relative to each other, rather I'm changing their location relative to the x-axis.

But the conversation didn't end there. When I shared this with blog reader, Roberto, he responded with a couple of graphs to help illustrate his points. The first shows the original line (blue) on the primary y-axis (ranging from 0 to 20) and the line rescaled onto a secondary axis (red; with axis ranging from 5 to 20).

The next graph shows the same initial line on the primary axis (blue) and the line rescaled onto a secondary axis (red) that ranges from 5 to 105.

It's true that the absolute perception of steepness changes with the changing axis range. You see this when comparing either line that's plotted on the secondary axis to the original. But I'm not convinced that the relative slope between the two segments of the line are impacted, rather these appear to move together as the axis range changes.

To make sure I'm not promoting anything inappropriate, I consulted a couple other sources/experts. Alberto Cairo said when possible and to avoid confusion, retain the zero baseline. He suggested when this isn't feasible, you can create two line graphs rather than one, where the one with the zero baseline can be a small inset without the scale (just the baseline) in one corner of the larger graph, where you've zoomed in. This is an interesting solution, and one I plan to try out when the next opportunity presents itself.

I also consulted Stephen Few's Show Me the Numbers, where his description of zero-based scales reads as follows:
When you set the bottom of your quantitative scale to a value greater than zero, differences in values will be exaggerated visually in the graph. You should generally avoid starting your graph with a value greater than zero, but when you need to provide a close look at small differences between large values, it is appropriate to do so. Make sure you alert your readers that the graph does not give an accurate visual representation of the values so that your readers can adjust their interpretation of the data accordingly.
He follows this up with an example zoomed in line graph with the following warning: "Attention: The dollar scale along the vertical axis is narrow to reveal the subtle, yet steady rise in sales since July."

So the bottom line is: you can have a non-zero baseline in line graphs (which can be useful when the numbers you want to show are some distance away from zero), but I (and other experts) caution the use of care when doing so. You want to take context into account and make sure you aren't zooming in a way that visually overemphasizes minor differences. Also, make it clear to your reader that you aren't utilizing the full scale. Agree/disagree? Have other ideas for addressing this challenge? Leave a comment with your thoughts.

Big thanks to Roberto for his thought-provoking comments (please feel free to jump in if I've mischaracterized anything; also, for those who might be interested, Roberto's Excel gallery can be found here). Thanks also to Alberto for taking the time to read my draft post and lend his thoughts.

Wednesday, February 12, 2014

more Americans are tying the knot

The Pew Research Center reports on some fascinating data. But I tend to be underwhelmed with the way they illustrate this data visually. The graphs aren't horrible. They look nice. They are well-labeled and on topic when it comes to the stories and reports in which they are found. But they still get under my skin. Because in many cases, some relatively minor modifications would transform the graphs from "not horrible" to great.

The following graph caught my eye as I was scrolling through my Twitter feed last week:

Take a moment to study this graph. What information does it reveal? What data points do you focus on? What comparisons does it enable you to make?

It's not a horrible graph. But it could be so much better. This prompted me to take a look at the full article in which this graph was contained. I had the same reaction to every visual display of data that was included. In all cases, the data in the graphs can help add visual evidence to the story that is being told (and Pew Research gets high marks from me when it comes to clearly articulating a story), but the graphs aren't structured in a way that facilitates that as well as they could.

By choosing the right type of graph and being more strategic with color, we can transform these graphs from not horrible to great. Let's take a look at each, in context of the stories that they are meant to help tell.

Story & Visual #1: Newly Married Adults
The new data show that 4.32 million adults (ages 18 or older) were newlywed in 2012, a 3% increase over the 4.21 million adults married in 2011.

Here's a quick overview of the changes I made:
  • Shift from bars to a line graph: Yes, you can show time in a bar chart, but they don't tend to allow the audience to see trends as easily as the connected points in a line graph do. Also, years ordered descending downward isn't as intuitive as increasing years from left to right. 
  • Use color more strategically: Don't use color just to use color. Rather, use it to draw your audience's eye to where you want them to look. In this case, if the point we're making is about 2012, let's use color there (and only there) to help reinforce the story that we want to tell (in this case we could have possibly even made the last line segment between 2011 and 2012 that same shade of green, it's that slope that shows the 3% increase referenced in the article; we'll look at another example using this approach momentarily).
  • Related thought - decimal places: I originally wanted to reduce the number of decimals to one, but that leaves points that don't appear to be the same labeled the same, which can be confusing (for example, the 2011 point in the line graph appears slightly lower than 2010, but if we reduce to a single decimal point, both data labels would be 4.2). If the values look different, make sure the data labels are set to a format that doesn't appear to contradict this.
  • Related thought - axis range: The rule is that bar charts must have a zero-baseline because of the way our eyes compare the endpoints (I'm not positive that this was the case in the original). With line graphs, you can get away with the minimum value on your y-axis being something other than zero, but you have to be cautious about over-zooming and making relatively small changes appear more significant than they are. In fact, when I first plotted this data in Excel, the program automatically zoomed way in:
Don't let your graphing application pick your axis range!

Story & Visual #2: New Marriage by Education
Almost the entire increase in new marriages from 2011 to 2012 is accounted for by the college educated.

This is the graph that originally caught my eye in my Twitter feed. In this case, the comparison we want the reader to make is between the Bachelor's degree or more series and the other series over time. We want to draw special emphasis to the increase over time for this group from 2011 to 2012, to help make the point that this group accounted for nearly all of the overall increase in new marriages.

The original chart isn't constructed in a way that makes this easy. Again, I'd recommend a line graph. In this case, the data works well (lines aren't overlapping, creating a spaghetti graph) and it's easier to compare the relative heights of the lines when they are all oriented against the same yearly x-axis (rather than repeat the years for each category, as was done in the original graph). Since the main point is about the Bachelor's degree or more series, we can call the reader's attention there through use of color. We can emphasize the 2011 to 2012 increase by using a darker shade of the same color. I rounded the figures, as decimal places weren't needed here (and can actually result in a false sense of precision, since I believe these figures are based on a survey sample, so not the entire US population).

Story & Visual #3: New Marriage by Age
The prime age for getting hitched is 25 to 34.

A similar approach can be taken for the third visual in the article, which was designed the same as the second visual, but focused on marriage rate by age. Typically, I would suggest leveraging the natural ordering of the categories (keeping the age groups in order from lowest to highest, as was done in the original), however in this case I think we can break that guideline and still have a chart that's easy to read because of the clear labeling of the various series. Again, this design (line chart, aligned by common x-axis, using color to highlight the series of interest) allows the reader to make the comparison we want - between 25-34 year olds and other ages - more easily than the original. Again, I rounded the figures shown in the data labels.

Note in this case, given the story (the prime age for getting hitched is 25-34 years), we could have potentially reduced the data shown to just the 2012 figures (perhaps in this case using a horizontal bar chart to compare across the various age groups - with that approach, I'd suggest keeping the age groups in numerical order). There are some benefits to retaining the historical context, however. First, it helps to put the 2012 figures into perspective. We also leverage the fact that our audience is familiar with this chart design (and how to read it), since we used the same approach previously. Whether to limit the data to only the pieces that directly support the story or showing additional context is always a question to debate when determining what to show (and the answer will change depending on the situation).

Story & Visual #4: Staying Married
It is one thing to get married, it is another thing to stay married. In spite of the recent uptick in newlyweds since 2011, it is still the case that fewer adults were currently married in 2012 (50.5%) than in 2011 (50.8%). The share of adults presently married peaked around 72% in 1960. 

It's probably no surprise that I stuck with the pattern of transforming a bar chart into a line chart here. My biggest issue with the original visual in this case isn't the chart type (though from a clutter/cognitive load standpoint, the single line is much cleaner than the multiple bars), but rather the discrepancy in time over the x-axis. In the bar graph, we start off in decades - 1920, 1930, and so on. Until the year 2000. After that, we jump to 2006. And then the figures are reported annually from 2006 through 2012. But all the bars appear visually the same, width- and spacing-wise. This is a big no-no. 

In the remake on the right, I've plotted the decade figures through 2010 and connected them with a line graph. Then I separately (on another graph that's layered over the first - this is a true example of brute-force-Excel) plotted only the actual dates for which there were values (on a scale that started off 1920, 1921, 1922, etc.), including the annual data points leading up to 2012. I colored only the points of interest - leading up to and 2012 to reinforce that the percentage currently married is at an all-time low, and the peak that happened way back in 1960.

The meta-point here is: if there is a specific story you want to tell, don't simply show relevant data, but rather display it in a way that makes it clear to your audience where to look for the evidence of the story you're telling. Choose a graph type that enables your audience to easily make the comparisons you want them to. Use color strategically to draw their eye to where you want them to focus their attention.

For those who are interested, the Excel file containing the above makeovers can be downloaded here.

Monday, February 10, 2014


In this series of posts, the focus is on concepts you can leverage at the onset of the communication process (when you know what you want to communicate, but before you've actually started crafting the communication itself). Previously, we've covered the 3-minute story and the Big Idea.

Today, we'll focus on storyboarding.

Storyboarding is perhaps the single most important thing you can do up front to ensure the communication you're crafting is on point. The storyboard establishes a structure for your communication. It's basically an outline. It can be subject to change as you work through the details, but establishing a structure at the onset will set you up for success. When you can (and as makes sense), get buy-in from your client or stakeholder at this step. It will help ensure what you're planning is in line with the need and reduce downstream iterations.

My #1 tip for storyboarding is: don't start with your presentation software. It's too easy to go into slide-creating-mode without thinking about how the pieces fit together and end up with a massive deck that says nothing effectively. I highly recommend going low tech here: leverage a whiteboard, post-it notes, or plain old paper. Personally, I like using post-it notes when I storyboard, because you can rearrange (and add and remove) the pieces easily and explore different narrative flows.

In prior posts, I've used the example of the summer learning program on science. If we're storyboarding this communication, it might look something like the following:

Note that in this case, the Big Idea is at the end. Perhaps we'd want to consider leading with that to ensure our audience doesn't miss the main point, and to help set up why we're communicating to them and why they should care in the first place.

In my opinion, the communication process (whether you're communicating with data or otherwise) shouldn't start with the creation of the communication. Rather, it should start with reflection on the context. Who are you communicating to? What do you need them to know or do? Once you've answered those questions, leverage the 3-minute story, Big Idea, and storyboarding to set yourself up for success when crafting your communication and delivering your message.

Thursday, February 6, 2014

what's the Big Idea?

In my last post, I discussed the 3-minute story and the importance of being able to concisely describe what it is you want to communicate (without reliance on your data and/or visuals). Today, we'll cover an even higher level aggregation: the Big Idea.

The Big Idea boils down the "so-what" of your overall communication even further: to a single sentence. This is a concept that Nancy Duarte discusses in her book, Resonate.* She says the Big Idea has three components:
  1. It must articulate your unique point of view;
  2. It must convey what's at stake; and
  3. It must be a complete sentence.
*A free multimedia version of Resonate is available on Duarte's website here.

In my prior post, I shared the example of a summer learning program on science and what the 3-minute story could sound like. If we condense that even further to the Big Idea, it might be:

The pilot summer learning program aimed at improving students' perception of science was successful and, because of the success, we recommend continuing to offer it going forward; please approve our budget for this program.

Bam. It's clear to your audience what they need to know and what you are asking of them. Some people think being verbose helps convince an audience of your knowledge on a subject, but this often has the opposite effect. It's difficult to be concise, but when you master it, it can work as evidence to your audience that you really know what you're talking about, because you know what's not essential and can boil your message down to its core.

In my experience, the entire resulting communication is better when the person delivering it has taken the time to be really clear on and made sure they can articulate the Big Idea. Note that if your communication medium is slides, each slide should have a clear Big Idea. Then there should also be an overarching Big Idea for the overall communication.

Stay tuned for the next post in this series on storyboarding.

Tuesday, February 4, 2014

the 3-minute story

Just two spots remain in next week's Boston storytelling with data workshop. Details and registration for this and upcoming sessions in Seattle & San Francisco can be found here.

In my workshops, the very first lesson we typically cover is on the importance of context. When you have some information you want to communicate, there are a few things that it's helpful to think about before you begin the data visualization process.

As part of this lesson, we discuss three concepts that I recommend employing for success when it comes to creating a communication - note that these apply equally well whether you're communicating data or communicating in general -
  1. the 3-minute story;
  2. the Big Idea; and
  3. storyboarding.
I'll cover each of these in a little detail in this and upcoming posts. 

the 3-minute story
 The 3-minute story is exactly what it sounds like: If you had only three minutes to tell your audience what they need to know: what would that sound like? This is a great way to ensure you are clear on and can articulate the story you want to tell. Being able to do this removes you from dependence on your slides or visuals for a presentation. This can be useful in the situation where your boss unexpectedly asks you what you're working on, or if you find yourself in an elevator with one of your stakeholders and want to give them the quick rundown or get their feedback. Or in the situation when you are watching your time on the agenda wane as others go over their allotted time...from the initial 30 minutes, to 20, to 10, to 5... If you know exactly what it is you want to communicate, you can make it fit the time slot you're given, even if it isn't the one you planned for.

Let's consider an example 3-minute story. Imagine that I am a 4th grade teacher:

A group of us in the science department were brainstorming last year - it seems by the time kids get to their first science class in the 4th grade, they come in with this attitude that it's going to be difficult and they aren't going to like it. It takes a good amount of time at the beginning of the school year to get beyond that. So we thought, what if we try to give kids exposure to science sooner? Can we influence their perception? We piloted a summer learning program last summer that was aimed at doing just that. We invited elementary school students and ended up with a group of about 30 2nd- and 3rd-graders. Our goal was to give them exposure to science in hopes of creating positive perception. To test whether we were successful, we surveyed students before and after the program. We found that, going into the program, the biggest portion of students (40%) felt just "ok" about science, whereas after the program, most of these shifted into positive perceptions, with nearly 70% of total students expressing some level of interest towards science. We feel this demonstrates early success of the program and that we should not only continue to offer it, but also expand our reach with it going forward.

Stay tuned for my next post, where we'll discuss how to boil this down further into the Big Idea.