Monday, January 27, 2014

HelpMeViz

We've all created a graph before and thought: Does this work? My advice when this situation arises is to seek feedback. Find a colleague or friend and show them your visual; have them talk out loud about what they see, where they pay attention, what questions they have. Their comments will help you understand whether the visual you've created is doing what you hoped it would, or in the case where it isn't, provide insight into where to concentrate your iterations.

A screenshot of the landing page at HelpMeViz.
Jon Schwabish has brought this critical feedback loop online, with his recently launched site, HelpMeViz, which was designed to "facilitate discussion, debate, and collaboration from the data visualization community." Anyone can post a visual they'd like feedback on, or an idea to help others who have submitted content (and note that anyone really means anyone: the site is not intended exclusively for data viz experts, but rather anyone who wants to receive or provide feedback). In addition to work-in-progress, the project is also open to published projects, so if you've completed work and want to see a sort of post-hoc on how others might have approached it, HelpMeViz can facilitate that as well.

I love the concept, and it's great to click through the submissions that have been posted so far, as well as read the dialogue they have inspired through reader comments. In some cases, HelpMeViz is also prompting full makeovers, such as the one published on Peltier Tech Blog this morning (link).

Congrats to my friend Jon Schwabish for providing this platform (and thank you for the many hours you've devoted to get it to where it is!). I'm excited to continue to watch it grow and read and participate in the dialogue.

To you, Reader, HelpMeViz is an incredible resource that I hope you will leverage!

Wednesday, January 15, 2014

multifaceted data and story

Registration for the upcoming workshop in Seattle is now open! Details and registration for that, plus upcoming sessions in Boston, DC, and San Francisco can be found here.

Last weekend, I ran workshops for two kdmcBerkeley 1-day sessions on Data Storytelling: Tools and Techniques for professionals working in the public health domain in California. To illustrate the concepts we covered, I used an example based on data from kidsdata.org that showed the percent of 7th graders meeting state fitness standards by race over time.

This is a rich dataset in terms of the number of facets one could focus on and the number of stories one could use it to illustrate. We looked at a number of different potential stories, and how you can change how the audience views the data and what they pay attention to through what you emphasize (and deemphasize). I thought these techniques might be of general interest, so will share them with you here. (The full Excel workbook is downloadable via the link at the end of this post.)

Here is what the data looks like:









As a first step, if we simply plot the above data as a line chart in Excel, we get the following:


I've said this before: the "insert chart" step in your graphing application should be the very first step in your data visualization process (not your last!). We focused on the above in a discussion on clutter: identifying elements that aren't adding informative value and getting rid of them. In this case, we can do things like: eliminate chart border, gridlines, and series markers, drop the trailing zero from the y-axis labels, and reduce the number of x-axis labels so the text will fit horizontally. We also decided the Multiracial line was more distracting than informative, with only 2 data points, and that it wasn't critical to the story we wanted to tell, so we removed it. We reduced the work of going back and forth between the legend at the right and the data it describes by labeling the data series directly. We removed Excel's random color choices (another Cole adage: never let your graphing application choose your colors for you!). After all of that, you end up with something like this:


The next step is to figure out where we want to draw our audience's attention. As I mentioned, there are a lot of different things we could focus on and stories we could tell with this data. Let's look at a few.

We could draw attention to the Pacific Islander group. If we look at 2012 vs. 2002, there hasn't been much change. In the early 2000's, there was some improvement, but then this fell. As of 2012, Pacific Island 7th graders in California have fitness levels lower than every other race:


Or, we could focus on the gap: American Indian, African American, Hispanic/Latino, and Pacific Island 7th graders in the state of California have markedly lower fitness levels in 2012 than their Asian American, White, and Filipino classmates:


We could draw emphasis to the change over the past decade: from our beginning point in 2002 to the latest data in 2012. We see a general up-to-the-right trend. Which is a good thing. Right?


Except that, if we focus in on the past two years (since 2010), we see a declining fitness trend across every race:


If we step back and think about context: these numbers are all low! In fact, across the board, less than 50% of California 7th graders are meeting fitness standards:


And 50% is not the maximum. If we actually think about (and show) the opportunity of where the numbers could be, we see something like the following.


This isn't to say any of the above specific emphasis or stories are right or wrong or better or worse. It depends on context: who are you communicating to and what do you need them to know or do? Use the answers to these questions to determine what data to show and how to show it (without misleading). Note also how, when we emphasize one story, it actually makes it harder to see the others. This is something to be careful of, especially when you're in the exploratory analysis phase - you don't want this to lead you to inadvertently miss something important.

In this particular case, we talked about a (contrived) situation where we were working for a California non-profit on a new marketing campaign aimed at parents to encourage them to promote more physical activity for their children. We assumed also that the 7th grade data broken down by race the best data that we had available, recognizing that the ideal dataset doesn't always exist, or isn't always accessible, so trying to work with what we had. 

Here's what the final version looked like:


If you're interested, the Excel file containing all of the above visuals (as well as the step-by-step decluttering that I summarized above) can be downloaded here.

Tuesday, January 14, 2014

things change when you have children

I just watched a video of a short chat between Nancy Duarte and Garr Reynolds about creativity and story. In it, Garr talks briefly about how having children has influenced his view and approach: "Things change when you have children."



Yes, they do.

And will continue to for me, as my husband and I are excitedly expecting baby #2 this summer!

(When you've got one who is this cute, how can you help but do it again?!?)


Thursday, January 9, 2014

failure in design(er)

Yesterday evening, our recently purchased, lovely new couch arrived. Or, rather, the large boxes that contained our recently purchased, lovely new couch arrived. Suddenly, it was very clear what we gave up by not springing for "white glove delivery".

Not to fear, though. It came with instructions. My husband and I can both read and follow instructions.

Right?

Easier said than done, it turns out. These were certainly not the worst assembly instructions that I've ever seen, but they left a lot to be desired. Perhaps a very lucky or clever individual could get it right the first time (we were neither of those, as it turns out). But you'd have to know which details were important to pay attention to.

We had several false starts, turning the diagram round and round to say: Ah, now I get it! Wait, no, now the one frame piece is too long. Oh, now I see the problem. Oops, no, now the holes don't line up. After several such instances, we recognized that the bars in the frame are not equidistant apart (and it matters which two are closest together), we realized that two of the frame bars had four holes each and the third had two holes each and that the relative positioning of the bars with respect to one another is important, we learned that FX1 and GX1 are in fact not interchangeable (even though at the top they're shown with FX1 clearly on the left and GX1 on the right, but then below are less prominently switched).

Now that we've assembled the couch correctly (finally), we could do it again without breaking a sweat. We know exactly which are the important parts in the diagram to pay our attention. But why was it so difficult the first time around?

I'm in the middle of a book I'm enjoying, The Design of Everyday Things. In it, Donald Norman asserts that when you have trouble with things, you shouldn't blame yourself (even though that tends to be people's natural tendency). Rather, it's the fault of the design and you should blame the designer. While this book focuses mainly on product design, I think many of the insights are true in the data visualization space as well. In this case, the corollary is clear: if you are struggling to understand a visual representation of data, don't blame yourself; blame the designer. Odds are, they didn't adequately take your needs as the audience into account in their design process. For those designing visual displays of information, this is a reminder to always keep your audience in mind, for, as Donal Norman says, "well-designed objects are easy to interpret and understand."

I unabashedly blame the designer of the instruction diagram for our difficulty assembling something that could have easily been straightforward. If the designer had thought about the intended users and leveraged affordances to make it clear which details were important and should be paid attention to, my husband and I would have had a much less frustrating process assembling our (now truly lovely) couch.

What design issues cause you frustration? What can we learn from this to apply in the world of data viz?

Monday, January 6, 2014

people analytics

Happy new year! Before I get to the meat of this post, bear with me through a quick plug for my upcoming public storytelling with data workshops in Boston, DC, Seattle, Chicago, and San Francisco. Details can be found here. Please help me spread the word!

Now back to our regularly scheduled programming...

Up until relatively recently, my day job was in People Analytics at Google. My career has been (and continues to be) focused on helping people make sense of, understand, and act based on numbers and analytics. Applying these skills in the people space over the past six years was a fascinating adventure.

People Analytics is an analytics team that is embedded in Google's Human Resources organization, where the goal is to help ensure that people decisions made at Google - decisions about employees or future employees - are data driven. Personally, I credit this role and my managers and team for really allowing me to use people analytics to hone my data viz and storytelling with data skills, gain a better understanding of the science behind data visualization, and give me the opportunity and autonomy to build and teach a course on data viz there, which ultimately paved the path to where I am today.

But I stray off track. Let's get back to the topic of people analytics. Because of the time I spent in this area (and Google's reputation in this space as a thought-leader), I periodically gets calls from the press asking for details. Recently, a reporter from the Wall Street Journal reached out to discuss "big data and how it's used in human resources". It turns out that they mostly wanted me to talk about some proprietary Google projects that I declined to comment on (unfortunately, I can't share some of the really interesting ground-breaking work), but I did sketch out some notes when I was thinking about the topic, that I thought I'd share here for those who may be interested.

Cole's [somewhat random] thoughts on People Analytics
Employees are a precious resource at any organization. Data can help you to make better decisions when it comes to these precious resources. Broadly, I think about People Analytics in terms of the different stages of the employee lifecycle:
  • Hiring: getting the right people in the door.
  • While they're there it's about making them as effective as possible and creating an environment and opportunities that optimize efficiency and impact (performance management, career development, rewards, employee sentiment).
  • Attrition: getting ahead of it so you can retain those you want to keep and push out those you don't (as appropriate).
You can put data to each of the above spaces to make smarter decisions. In the early stages of people analytics, much of it is descriptive: understanding what things look like currently and identifying gaps between that and where you want to be. As you move up the value chain, you can get into some really interesting predictive spaces to try to understand how things will look in the future and what levers you can pull to impact that.

There are a number of challenges when it comes to leveraging the people analytics space. I'll outline my view on two of the big ones:
  • Marrying what is often many disparate data sources into a single holistic view of the employee that can be aggregated up and looked at through different lenses so that the info is available to the right person at the right time to take action. This becomes even more challenging when you start to think about external data sources (e.g. Twitter, LinkedIn) that could be integrated for improved insight. In the early stages of people analytics, the first goal is to understand where you're at currently, which often takes the form of reports. Over time, these may be replaced by dashboards that "push" data out to internal stakeholders. Once the current state is known, analysts' time is freed up to focus on the more interesting questions and custom analysis to be able to drive data-informed decisions. 
  • Finding the right balance (and organizational appetite) between data-driven and considering the context. Many struggle to make the data make sense - taking the organizational and business context into account when it comes to interpreting and using the data. Most companies have a wealth of qualitative data - things that HR business partners or managers know that will never be adequately captured in hard numbers. There's also a wealth of information in text data that's typically largely untapped (resumes, interview notes, employee surveys, performance reviews, exit interviews). Being able to marry all of this together will provide the most robust view, but is easier said than done (and may sometimes be more than is needed, anyway). It's about figuring out when to lean a little more in one direction vs. the other to create buy-in and build the best solution for a given situation.
If this space sounds fascinating (it is), you can check out Google's open roles here. Increasingly, other companies are devoting brains to this area as well; search openings by querying People Analytics or HR Analytics.