For a while now, I’ve considered myself a good writer. I keep a daily journal, this blog, and used to write short stories and scripts for fun. Many of my favorite classes in undergrad were paper based. In the field of engineering, my love for writing is a rarity. I adopted the technical writing style taught in my class, and thought it was good. Well, having read Edward Tufte’s “The Visual Display of Quantitative Information“, I now know that class was a only start.

Who’s Tufte?

Edward Tufte is professor emeritus of statistics, political science, and computer science at Yale University. He’s been very active and outspoken about his ideas for communicating statistics and other technical data since 1975. I mentioned Tufte before when I discussed his hatred for Powerpoint. There his major gripe was the low resolution, so you can rightly expect him to prize high resolution graphics.

The Book in a Nutshell

The book is split into two parts. Part 1, Graphical Practice, gives examples of great graphics that communicate an extraordinary amount of information concisely and not-so-great graphics that lie to the reader or render the data unintelligible. This sets up the second part of the book, Theory of Data Graphics. Here, Tufte goes to the fundamentals. He disparages chartjunk, praises data-ink maximization, and explains data density. (You might not know these terms now, but once you learn them you’ll find yourself using them. I know I have started to think in terms of them.)

Graphical Practice

The above image shows Napoleon’s advance into and then retreat from Russia. It shows six variables: army size, location (x and y), direction of troop movement, and temperature on certain dates. Despite showing six different variables, it is easily intelligible. Looking at this, you can vividly see the toll of the frigid weather and the treacherous river crossings. This is a great graphic because it is allows you to draw comparison, shows lots of data in a small area, and tells a story.

To talk about graphical integrity, Tufte introduces the concept of the lie-factor. The lie-factor is equal to the size of the effect shown in the graphic divided by the size of effect in data. So in the image above, it is the change in the size of the line divided by the change in the number the line should represent. In this example, the decision to use perspective corrupts the display and results in a lie-factor of 14.8. The lines simply grow far too fast.

While this makes for a more dramatic graphic, it misleads the reader. As a reader, you simply look at how the lines have grown and conclude that the fuel economy is not only improving but improving more and more rapidly as the lines get bigger and bigger faster. Tufte replotted the data more truthfully below.

It is much easier to see the trend now, and you realize that the fuel economy improvement is slowing down. Clearly, the first graphic failed in that it misrepresented the data, made drawing comparisons difficult, and uses a large space to show a small amount of data.

Theory of Data Graphics

Having given an abundance of examples, Tufte moves onto the logical question of what makes a good graphic. I’ll quote his principles and then explain them.

“Above all else show the data. Maximize the data-ink ratio. Erase non-data ink. Erase redundant data-ink. Revise and edit.’’ The first and last sentences are obvious. To understand the middle three, you need to know that data-ink is literally ink used to display data as opposed to such things as the axes, the labels, the title, and the border. For example, data-ink would be the dots and lines connecting them in the redone fuel economy graphic above. Looking more closely at it, you see that there is no border on the plot. In fact, there is no y-axis. This minimalist design is what Tufte means by erasing non-data ink (the border) and redundant data-ink (the y-axis).

“ Forgo chartjunk, including moiré vibration, the grid, and the duck.’’ Chartjunk is any decoration that does not add to the graphic. It includes the obnoxious patterns in bar charts, the obfuscating grid , and excessive use of color. Particularly troublesome are the patterns, as they often give rise to moiré vibrations. You can simply use different shades of gray instead of the hatching. If you use too much chartjunk and let decoration overwhelm the data, you end up with a duck. A duck is Tufte’s odd term for a graphic that tells you nothing because the decoration has so corrupted the data.

“Maximize data density and the size of the data matrix, within reason. Graphics can be shrunk way down. Use small multiples.’’  The first principle arises from the earlier idea of maximizing data-ink and erasing non data-ink. Data density is defined as the number of entries in the data matrix dived by the area of the data graphic. This feeds right into the second principle of shrinking graphics down. Using these two principles, you should communicate a large amount of data in a very small space. The last principle seems like the odd man out. Small multiples are like the pages of a flip book. By laying out these pages next to each other, you can see how a process evolves over time. Since these are small, the data density is high. Furthermore, they invite comparison and are inherently multivariate. Thus, they always carry a high data density and should be used if possible.

“If the nature of the data suggests the shape of the graphic, follow that suggestion. Otherwise, move toward horizaontal graphics about 50 percent wider than tall.’’ This is pure aesthetics and pulls from the whole Golden Ratio idea. This is easily the weakest section of the whole book, but even here Tufte measured many graphics and found that almost all are wider rather than taller. As for the 50%, that is a very rough rule of thumb.

Taken together, these principles will make your graphics tell a dramatic story concisely.

Why Are There So Many Bad Graphics?

As I read through the book, I was surprised to see Tufte find fault with The New York Times, Time, the Journal of the American Statistical Association, and other authoritative publications. These are respectable names, and you would expect them to be getting it right. Tufte thinks that graphical designers lack quantative reasoning skills. Thus, they cannot do justice to the data. They don’t know what decoration is acceptable and what distorts the data. That is exactly what the New York Times did in the above fuel economy road graphic.

I think Tufte is really onto something here. At first, I thought that maybe the text was outdated here, but then I saw a TED talk by the current Data Artist in Residence at the New York Times, Jer Thorp. He creates graphics that are ducks, that fail at being data-dense, and probably have a high lie-factor.

The point is, with an abundance of graphics that misrepresent the data whether intentionally or unintentionally, you should always be on your guard. Don’t just judge things on how they look, but read the numbers and if necessary replot the data.

I’ll have another post up in the coming weeks about how we engineers can make use of these ideas as well as more presentation advice from Tufte.


Personal Heroes

March 11, 2012

It seems today that the only heroes are soldiers and superheroes. I’m going to say that a hero is anyone who can serve as a rolemodel. People often talk about rolemodels, but I never really understood what a rolemodel was nor have I ever had one. I don’t know any of my friends who have a rolemodel. My recent readings have taught me that I should have a rolemodel or personal hero. I’d like to run my reasoning by you and see if you agree.

The only times in my life that I have been asked about my heroes or rolemodels  have been on college applications and as class assignments. Way back in seventh grade for my computer science class, I chose the Red Baron as my superhero because my family would buy Red Baron brand pizza. What a stupid reason! I remember giving his name and then furiously hoping that he wasn’t a Nazi. That’s how little I “my hero”. While the assignment did teach me about computers, it did not impress the importance of having a hero.

What has bridged that communication gap has been looking at what people with heroes have accomplished. In “Being Elmo: A Puppeteer’s Journey”, Kevin Clash clearly idolizes Jim Henson and the other puppeteers. He really does seek to emulate them in his life. His success as Elmo is partially due to this drive. Cal Newport in his book “How to Be a High School Superstar” sets out an excellent method to deconstruct a hero’s success. Cal helped me understand how to look at a successful person and see their journey. He has a series of blog posts about defeating procrastination by fixing on a role model. Finally, Ramit Sethi wrote about meeting people whom you look up to and learning from them. As per his usual style, Ramit implored me to make a list of ten people I want to meet. This caught me as off guard as that assignment back in seventh grade.

Well, I think it is about damn time I make a list of great people and use them to mark my journey to success. By having a rolemodel, I can effectively give myself a target to aim for in my own life. Furthermore, searching for rolemodels means that I will contact people and maybe find myself a mentor as Kevin found in Kermit Love, who introduced Kevin to Jim Henson.

To that end, I’m going to lay into my libraries’ biography section. I’m planning on picking up books on scientists, engineers, and any other people who inspire me. I’ll report back to y’all on the progress. Speaking of reports, I finished “Visualizing Quantitative Information” by Edward Tufte and am working on post about it.