Saturday, August 10, 2013

recommended reading: Data Points

I've been a long-time fan of Nathan Yau and his work. He writes the popular (and prolific) blog, Flowing Data and published his first book, Visualize This, in 2011 (you can read my review of that book here).

Nathan's second book, Data Points, came out earlier this year. When a client of mine said the following -
"I've only read chapter 1 so far of Data Points, but it's the best chapter 1 on data vis I've read. He talks about the data being an abstraction for reality, and he uses his own wedding pictures to explain. Great idea, and great example."
I promptly ordered my copy and, upon arrival, consumed its contents. I must agree, this is a fantastic book. Moving beyond the aesthetics (it's the right size and has that sort of matte cover that just feels nice - one of the reasons an e-book will never satisfy me), the lessons are clearly articulated and well demonstrated through creative and varied examples.

In Data Points, Yau focuses on data visualization as a medium, rather than a tool, and discusses the importance of context when figuring out what the right medium will be (which he argues can run the gamut, from the informative to the entertaining). Rather than a narrow focus on the visualization step, Nathan spends a good portion of the book discussing the data itself and strategies for exploring and analyzing it. One example used early on effectively was based on car crash data: he showed how you can aggregate in different ways to highlight different aspects of the data - the difference if you show hourly vs. daily vs. monthly vs. annual data, with discussion on the specific context that might lead you to choose one view over the other.

When it comes to visualizing the data, Nathan breaks it down into four "working parts" -
  1. Visual cues - how you encode the data in things like shapes, sizes, and color; 
  2. Coordinate system - Cartesian vs. polar vs. geographic;
  3. Scale - linear, logarithmic, categorical, etc.; and
  4. Context - clarifying what values represent, explaining how people should read your visualization.
There is a good deal of explanation on each of these to make the concepts clear. Yau refers to the above as the ingredients and describes putting them together to get a complete visualization worth looking at. He says,
"To make the jump from data to visualization, you must know your ingredients. A skilled chef doesn't just blindly throw ingredients into a pot, turn the stove on high, and hope for the best. Instead, the chef gets to know how each ingredient works together, which ones don't get along, and how long and at what temperature to cook these ingredients."
If I had to find fault in this book, it would be that Yau's discussion of context is mostly limited to the context of the data and helping your audience understand your visual, than about the broader situational context and helping your audience understand how the data fits into something bigger. For me, this is the storytelling piece, the step that helps make the data visualization relevant.

Still, it was an excellent read that I would recommend without hesitation. It's a thorough and accessible overview of data exploration, analysis, and visualization. For those already familiar with this space, it serves as a good reminder of some of the things we should make sure to pause and think about as we are working with and visualizing data.

I found the book to be such a great overview, in fact, that I will be using it as one of my required texts in Introduction to Information Visualization, which I'll be teaching in MICA's MPS in Information Visualization this fall.

You can purchase Data Points in the storytelling with data bookstore.