Principles Behind Good, Data-Driven Graphics

March 28, 2012

For a while now, I’ve considered myself a good writer. I keep a daily journal, this blog, and used to write short stories and scripts for fun. Many of my favorite classes in undergrad were paper based. In the field of engineering, my love for writing is a rarity. I adopted the technical writing style taught in my class, and thought it was good. Well, having read Edward Tufte’s “The Visual Display of Quantitative Information“, I now know that class was a only start.

Who’s Tufte?

Edward Tufte is professor emeritus of statistics, political science, and computer science at Yale University. He’s been very active and outspoken about his ideas for communicating statistics and other technical data since 1975. I mentioned Tufte before when I discussed his hatred for Powerpoint. There his major gripe was the low resolution, so you can rightly expect him to prize high resolution graphics.

The Book in a Nutshell

The book is split into two parts. Part 1, Graphical Practice, gives examples of great graphics that communicate an extraordinary amount of information concisely and not-so-great graphics that lie to the reader or render the data unintelligible. This sets up the second part of the book, Theory of Data Graphics. Here, Tufte goes to the fundamentals. He disparages chartjunk, praises data-ink maximization, and explains data density. (You might not know these terms now, but once you learn them you’ll find yourself using them. I know I have started to think in terms of them.)

Graphical Practice

The above image shows Napoleon’s advance into and then retreat from Russia. It shows six variables: army size, location (x and y), direction of troop movement, and temperature on certain dates. Despite showing six different variables, it is easily intelligible. Looking at this, you can vividly see the toll of the frigid weather and the treacherous river crossings. This is a great graphic because it is allows you to draw comparison, shows lots of data in a small area, and tells a story.

To talk about graphical integrity, Tufte introduces the concept of the lie-factor. The lie-factor is equal to the size of the effect shown in the graphic divided by the size of effect in data. So in the image above, it is the change in the size of the line divided by the change in the number the line should represent. In this example, the decision to use perspective corrupts the display and results in a lie-factor of 14.8. The lines simply grow far too fast.

While this makes for a more dramatic graphic, it misleads the reader. As a reader, you simply look at how the lines have grown and conclude that the fuel economy is not only improving but improving more and more rapidly as the lines get bigger and bigger faster. Tufte replotted the data more truthfully below.

It is much easier to see the trend now, and you realize that the fuel economy improvement is slowing down. Clearly, the first graphic failed in that it misrepresented the data, made drawing comparisons difficult, and uses a large space to show a small amount of data.

Theory of Data Graphics

Having given an abundance of examples, Tufte moves onto the logical question of what makes a good graphic. I’ll quote his principles and then explain them.

“Above all else show the data. Maximize the data-ink ratio. Erase non-data ink. Erase redundant data-ink. Revise and edit.’’ The first and last sentences are obvious. To understand the middle three, you need to know that data-ink is literally ink used to display data as opposed to such things as the axes, the labels, the title, and the border. For example, data-ink would be the dots and lines connecting them in the redone fuel economy graphic above. Looking more closely at it, you see that there is no border on the plot. In fact, there is no y-axis. This minimalist design is what Tufte means by erasing non-data ink (the border) and redundant data-ink (the y-axis).

“ Forgo chartjunk, including moiré vibration, the grid, and the duck.’’ Chartjunk is any decoration that does not add to the graphic. It includes the obnoxious patterns in bar charts, the obfuscating grid , and excessive use of color. Particularly troublesome are the patterns, as they often give rise to moiré vibrations. You can simply use different shades of gray instead of the hatching. If you use too much chartjunk and let decoration overwhelm the data, you end up with a duck. A duck is Tufte’s odd term for a graphic that tells you nothing because the decoration has so corrupted the data.

“Maximize data density and the size of the data matrix, within reason. Graphics can be shrunk way down. Use small multiples.’’  The first principle arises from the earlier idea of maximizing data-ink and erasing non data-ink. Data density is defined as the number of entries in the data matrix dived by the area of the data graphic. This feeds right into the second principle of shrinking graphics down. Using these two principles, you should communicate a large amount of data in a very small space. The last principle seems like the odd man out. Small multiples are like the pages of a flip book. By laying out these pages next to each other, you can see how a process evolves over time. Since these are small, the data density is high. Furthermore, they invite comparison and are inherently multivariate. Thus, they always carry a high data density and should be used if possible.

“If the nature of the data suggests the shape of the graphic, follow that suggestion. Otherwise, move toward horizaontal graphics about 50 percent wider than tall.’’ This is pure aesthetics and pulls from the whole Golden Ratio idea. This is easily the weakest section of the whole book, but even here Tufte measured many graphics and found that almost all are wider rather than taller. As for the 50%, that is a very rough rule of thumb.

Taken together, these principles will make your graphics tell a dramatic story concisely.

Why Are There So Many Bad Graphics?

As I read through the book, I was surprised to see Tufte find fault with The New York Times, Time, the Journal of the American Statistical Association, and other authoritative publications. These are respectable names, and you would expect them to be getting it right. Tufte thinks that graphical designers lack quantative reasoning skills. Thus, they cannot do justice to the data. They don’t know what decoration is acceptable and what distorts the data. That is exactly what the New York Times did in the above fuel economy road graphic.

I think Tufte is really onto something here. At first, I thought that maybe the text was outdated here, but then I saw a TED talk by the current Data Artist in Residence at the New York Times, Jer Thorp. He creates graphics that are ducks, that fail at being data-dense, and probably have a high lie-factor.

The point is, with an abundance of graphics that misrepresent the data whether intentionally or unintentionally, you should always be on your guard. Don’t just judge things on how they look, but read the numbers and if necessary replot the data.

I’ll have another post up in the coming weeks about how we engineers can make use of these ideas as well as more presentation advice from Tufte.


5 Responses to “Principles Behind Good, Data-Driven Graphics”

  1. […] been thoroughly impressed by Edward Tufte’s first book, The Visual Display of Quantitative Information, I decided to pick up another, Visual Explanations. Tufte envisions his books fitting together like […]

  2. jeremywsherman said

    What of animated presentations of information? Consider

    “Chartjunk”. I need to use that term.

    • logisticalmiasma said

      I haven’t considered animations as much, but I’m starting to think about it now. I think one big issue would be pacing. The animation can’t be so fast that you cannot follow it nor so slow that it bores you. However, the speed is to some degree determined by the viewer. Think about reading subtitles with a friend who has is unaccustomed to them. You think the subtitles hang on the screen far too long, but they might be racing to finish them every time.

      I think about chartjunk all the time now. See this example from a report commissioned by the government of Australia:

  3. Enjoyed the write-up. I teach Statistics at the high school level and have the joy of teaching students about misrepresentation of data, only to turn around and see it done by teachers and administrators on a repeated basis. One of education’s favorite tricks is the plotting of averages in scatter-plots to over emphasize the strength of an association.

    • logisticalmiasma said

      Glad you liked it! I’m currently reading another of his books and am going to write about it too when I finish it.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: