Sunday, January 30, 2011

visualize your LinkedIn network

LinkedIn launched what I consider to be a very cool addition to their site last week: the ability to visualize your network. They call the tool InMaps, and once you log in and hit go, it cycles through your connections to build a striking visual. Each node represents a connection (mousing over will tell you who each is, or clicking on one will highlight their connections within your network). The bigger the node, the more connected the person is. Contacts are clustered into groups and color coded based on their associations with one another.

Let's see it in action - here is my network:

From Storytelling with Data

My network is divided into four main clusters (two of which are closely related). In blue are my Google colleagues. Pink represents my connections from business school (go Huskies!). The green and orange intersecting clusters at the bottom represent my contacts from Washington Mutual. I worked in a number of different roles when I was there, and by looking at the names it appears LinkedIn has grouped two of the main ones: the first (green) are connections I made while managing home equity fraud, and orange represents those made from my earlier various roles in credit risk management. It makes sense that this network is the largest - I worked there longer than I have anywhere else and was working there when LinkedIn was created and first started to become popular.

Want to see your network? Follow this link. See what insights you can draw. And if we aren't already connected, send me an invite!

Thursday, January 20, 2011

simple vs. sexy

When it comes to shoes (women's shoes, at least), there are use cases for each. Is the same true for visual displays of information?

My bias (in visuals, if not in shoes) is for simplicity: simple graphics are typically easier to understand and draw insight from. Glitz in infographics is a turnoff for me; it often adds clutter without informative value, making it harder to understand what to pay attention to.

That being said - is there a use case for the sexy infographic?

To explore this question, I decided to test out Circos, an application for "circular visualization". The software was originally created to visualize genomic data, but is now being advertised more generally and has been featured a number of times in the New York Times and other media. When I initially looked at it several months ago, I wrote it off, not immediately seeing scenarios where it would be useful.

But then a couple weeks ago, I was thinking about visualizing a matrix: movement into, out of, and between certain groups. I sketched a few diagrams as I was brainstorming ways to look at it visually. I had a concept drawn out that I thought would work, but was a little unsure how to bring it to fruition with the programs I typically use (most often, Excel). So I started looking through my list of tool resources. When I came to the Circos site, I knew this would be the perfect situation to try it out.

Here's the scenario:

At the company where I work, lateral job transfers are frequent. Transferring to another team or department is a way for an individual to learn new skills and broaden their experience. One interesting thing to understand is how individuals move throughout the organization: in what relative volumes, what areas are net importers or exporters of talent, etc.

One way we can see and understand this movement is in a matrix. Below is an example of what this could look like. (Note: the data shown here are artificial.)

You can read across a row to see transfer activity out of a given group, or down a column to see transfer activity into a given group. Note the operative word: read. People read tables. You have to spend quite a bit of time looking at the numbers to start to draw insights. They don't jump out at you visually, the way they do in pictures (graphs).

Granted, there are some things I could do to add visual cues to this table to make those insights easier to draw. I could turn it into a sort of heatmap through use of color, drawing more attention to where the large volumes are coming and going. Or I could embed summary graphs of aggregate movement in and out to give a visual of relative magnitudes.

But rather than make minor changes to this simple visual, I want to explore the other end of the spectrum: sexy.

Here is what the same data looks like, plotted with the Circos Table Viewer:

It is a little overwhelming at first. But after taking a bit of time to orient myself to what I was looking at, I found the visual insights started coming quite quickly. By far the highest transfer volume is coming from and going to groups A and B. It's also quickly clear that groups A and B are both net exporters of talent (more people transfer out than transfer in), and you can follow the lines to see where people are coming from and going to. The visual allows you to look for anomalies and the unexpected. Maybe I'm a big dork, but the visual exploration is sort of fun - certainly more fun that it was in the initial matrix.

Earlier today, I found myself observing an unplanned test with this visual. I was at a 2-day offsite, and the topic of internal mobility came up. I fetched the version of this based on real data and showed it to a couple people during one of our breaks. I subsequently put the piece of paper with the visual on it in front of me on the table. Over the next hour, everyone within reach grabbed it at one point and spent several minutes focused on it. You could visually see on each person's face the cross-over from curiosity to scrutiny to understanding and the discovery of a visual insight.

I'm not sure the same would have happened if the paper had shown the initial matrix on it.

Rather, my guess is the table of numbers would have remained untouched. (I'm totally tempted to try this to see whether it holds true, and plan to do so at the first opportunity). I think the reason people picked up the paper was because the visual was different. It is confusing, but visually appealing in a way that grabs your attention and ignites a desire to understand what, exactly, this complicated looking thing is trying to convey.

In summary, if this test is any indication, I believe I've landed on two solid use cases for the sexy graphical display:
  1. To reveal new insight: The clearest use case to me is if your visual allows for a better understanding of what's going on than a simpler graphic will.
  2. To grab attention: Sexy graphics can garner attention in a way that a simple or more standard visual may not. Note that grabbing attention alone isn't enough - the visual needs to be able to also hold the audience's attention long enough for them to figure out how to read it, or won't be successful at revealing new insight. Walk your audience through what they are looking at to make the visual more accessible.
Are there other situations where sexy graphics win over simple?

Thursday, January 13, 2011

FlowingData challenge

I am a regular reader of Nathan Yau's FlowingData blog. Today, he posed a challenge: redesign the following graphic, which summarizes responses to Pew Research Center's polls on how people get their news. I was planning to wait until the weekend to tackle it, but once my work wound down today, couldn't help but spend some more time in front of the computer to take on the challenge.

The graph isn't horrible. But there are a few things that bother me:
  1. There's simply too much going on. There's no need to label every point. Also the data series names look messy to me intermixed with the graphic.
  2. The color scheme is not colorblind friendly. Around 8% of men and 0.5% of women are color-blind, which most often means difficulty distinguishing between shades of red and shades of green. This means the Newspaper, Internet, and Radio trends (and corresponding data labels and series names) will all look grey to a portion of the audience. I imagine Internet was made red on purpose to stand out, but this will be lost on those who are colorblind.
  3. The graph is compressed horizontally. Yes, the proportion citing Internet as their primary news source has increased. But the compression seems to overemphasize this.
I played around with converting to % growth. Doing so shows the increase in Internet very clearly, but you miss an indication of the beginning baseline to really make sense of the numbers. Rather than do that, I left the chart mostly as it was but made some relatively minor changes to fix the things bothering me that are outlined above.

I was a little reticent to post this at first, as it is very similar to the makeover I posted last week. But one thing I often say in my Data Visualization course is that if you find something that works, use it again and again. From my perspective, this works.

Let me know what you think!

Wednesday, January 12, 2011

from points to poignant

I love a good makeover challenge. I received the following graph in response to my call for visuals ahead of my visit to a Midwest retailer last month to run a session on data visualization. It should be noted that the creator was not submitting this as a graphical standard, but rather was looking for advice on how to better show the data in a way that would make sense. (Note: I've changed the data and generalized the series names so as not to give away any sensitive information.)

Behind this data is a scorecard. The data in this scorecard is used to calculate the weighted performance index by category across the given business (let's call it Our Business) and 5 competitors. I would assume the goal of the visual, then, is to show how Our Business stacks up against the competition to highlight relative strengths and areas for concern.

The graphic above does that, but not in an easy-to-read way. To make sense of the data, the reader has to basically read each column of data points. It's very difficult to see the relative performance of Our Business across the different categories, especially since sometimes our series marker is totally obscured behind other data. There must be a better way.

The following shows my attempt at a revamp.


One thing that was really challenging about the original dataset is that the weighted performance index was on a range from -1.5 to +1.5, which means when you plot it, you have data going in two directions. The first thing I did was to change the scale such that it would range from +0 to +3. In this case, the actual number is less important than the relative ranking, so I got rid of the numbers altogether and show the relative difference between Our Business and the various competitors visually.

In the above, you get two visual comparisons easily: 1) you can focus on just the blue bars to see how Our Business is doing across the various categories on a relative basis and 2) within a given category, it's easy to see how Our Business is doing relative to the competition. I decided to show this both through the horizontal bars, as well as by labeling the rank of Our Business relative to the competition in each category for a quick summary stat.

Labeling the rank directly meant that I could leave the competition in the same order in the subsequent bars, so I wouldn't have to complicate the deciphering of the visual by making each a different color (which would both complicate the interpretation of the graphic and make Our Business stand out less) and could just label the relative order once in the legend on the left.

Leave a comment to let me know what you think!

Monday, January 10, 2011

the joy of stats

The Joy of Stats has been on my to watch list for a couple of weeks. It's a 60-minute program with Hans Rosling, an animated and passionate Swedish global health professor, founder of Gapminder, and one of my favorite data storytellers.

In the segment, Rosling shares his obsession with stats. It includes clips from some of his lectures and TED talks. The stats overview was fairly basic (averages, variance, distributions, correlations), but the hour was entertaining and fact-filled. Here are a few of the highlights:
  • Crimespotting by Stamen Design for showing the relationship between topography and crime: it's making citizens more powerful by arming them with information so they can hold officials more accountable.
  • Florence Nightingale's polar area graph for showing the magnitude of deaths from preventable diseases contracted in the hospital, which led to a revolution in hospital hygiene.
  • David McCandless' Billion Pound-O-Gram for understanding the relative sizes of the really big numbers reported by the news and government.
  • Jonathan Harris & Sep Kamvar's We Feel Fine project (an "exploration of human emotion on a global scale") for uncovering trends in how people are feeling.
Though I found it a little overly glitzy at points, Hans Rosling was entertaining as always and I found myself glued to my computer throughout to see what he would do or talk about next.

If you haven't seen Rosling in action before, or want to get a better sense of what to expect, check out the following short clip from the program:

Why are these sessions so good? In addition to the creative use of data visuals to support his claims, Rosling's enthusiasm as he tells stories with data is both entertaining and contagious. Makes you wish you had visuals in real space, right? (Ok, that part might have been a little over the top.)

Side note: my personal opinion is that motion graphics (like the bubble charts used in many of the talks, which can be created via a gadget in Google Spreadsheets known as Motion Charts) need to have somebody there to talk through them and help the audience understand what to pay attention to for them to be effective. Hans Rosling is masterful at this.

"If the story in the numbers is told by a beautiful and clever image, then everyone understands." - Hans Rosling, The Joy of Stats

Tuesday, January 4, 2011

a new year's resolution: declutter your graphics

Happy 2011! One popular new year's resolution is to declutter: declutter your closet, declutter your life. I'd like you to consider decluttering of a different variety: declutter your data visuals.

Last month, I visited a Midwest retailer to run a workshop on communicating effectively with data. Leading up to the session, I asked participants to send me data visualizations on which they would like feedback. I decided I'd take it a step further from just providing feedback and do visual makeovers on a few of the submissions.

As I was revamping the graphs, I relished in the fun of teasing a story out of data. It never ceases to amaze me how relatively minor changes can take a visual from a mess of data to a clear message that pops. In this post, I'll walk you through one such transformation. (Note: I've changed the data so as not to give away any sensitive information.)


The intent of the graphic (discerned from the slide title, which is omitted above) is that, while spending less on clothing and using coupons more both have net increases over the time period considered, shopping sales more often is relatively flat at end of period compared to beginning of period. This is highlighted directly on the graphic via the red circles at the beginning and end points.

It takes a bit of time and patience to tease this story out with the above graphic. That's due to a couple of reasons:
  1. The sheer distance between the series and the legend means you have to refer back and forth a couple of times to understand what you're looking at and which series is which (the different colored shapes on the black lines aren't immediately visually very different from each other, which doesn't help).
  2. There are a lot of distractions that you have to sift to in order to figure out what's important to pay attention to. The background shading, gridlines, and borders don't add any informative value. But visually, they are as strong as the data, which means we have to take everything in to establish an order of priority in which to consider the information, because this isn't done visually for us.
Below is my remake, graphing the same data.


First, I changed the series colors to be strikingly visually different from one another. I eliminated the markers for the individual data points, which added unnecessary clutter. I left markers for only the beginning and end points (and labeled those directly, removing the y-axis altogether), since the story the author seemed to be trying to tell focused on those.

I eliminated the background shading, gridlines, and borders. Not only did these not add informative value in the original visual, but they actually make the data stand out less. If we want our data to tell the story, we need to make it stand out the most. In the resulting visual, there are few things competing for our attention, so we can relatively quickly and with little effort start to see what can be learned from the information that is being presented.

Note: One thing you don't get from my madeover version is a clear indication of the magnitude of the points in the middle of the graph (though you can get some sense of this from the relative positioning of the labels on either end). If this were important (I don't believe that it is here), you could add very thin, light horizontal gridlines at multiples of 10's to help make that more explicit.

Often, it is minor changes that will help you take a visual from good to great. Decluttering is a great place to start improving the effectiveness of your graphics. Ask yourself: does this need to be here? Is it adding any informative value? If the answer is no, then take it out!

Leave a comment to let me know what you think. Stay tuned for more makeovers!