Designer Graphics
I have the misfortune of owning all of Ed Tufte’s books. Reading and enjoying these books has given me a sensitivity to data display that now makes me cringe whenever I see a terrible chart. Even in academic papers, benchmark charts and other data pieces are done with common tools (M$ ex-hell, bleh!) that produce shit for graphics. What’s particularly troublesome is the amount and time that it takes to produce a good looking, designer quality, Tufte approved, display of data. This is a story of why an education in data display is actually a handicap to a happy life of blissful ignorance.
Last week, I wished to chart out some benchmark results. After about an hour of thinking, browsing the web looking at other charts, I knew what I wanted to make, and exactly what design principles to use as my guide. Since the benchmark suite is made up of around 20 tests, each with a name, they deserve to be written out along the y-axis. Why, oh why do so many other people place these names along the x-axis where they have to be tilted? Do they not know the difficulty of reading such a mess?
Each test has associated with it some numbers. I’m not particularly a fan of placing a table of numbers in my documents, as I find that a pictorial display helps me see patterns easier. So, I compromised: The raw numbers of data would get two columns, immediately following the name of the test, while the ratio between those numbers is displayed in a horizontal bar chart (following Tufte’s design with soft coloring, absent frame, and ghost splicing for each tick on the axis). Together this combination actually looks pretty good, and helps the data to speak its story clearly.
Now, I that I knew what I wanted, I had to find a way of creating it. I use Linux, so this somewhat hurts my readily available tool-base, but I’m a CS nerd, so I can tolerate using some of the tools with steeper learning curves. After browsing around, I figure I would have most success using python and matplotlib lib. I’ve used this combo in the past, and was able to get quick and adequate plots. (Around that time I’d also tried gnuplot, but found it was even more trouble.)
So, armed with my chose plotting software, I browse through the matplotlib example gallery and try to piece things together. I actually spent all day on this task, and couldn’t get the layout that I wanted. I couldn’t even get the names and bars to line up properly, while also including a nicely formatted table. It was a terrible, horrible, no good, very bad day! I became distraught and ill-tempered. I couldn’t make what was in my head a reality on paper! I started to weep and yearn for a better way, why couldn’t there be a graphics package out there that makes it easy to do what I want? Then the curse of being a software developer kicks in: I began to think how I would write a library that could solve my problem…
I stopped myself from writing such a package, but I can’t help but post on what design principles it would involve. First, each element that I wanted in my plot could be fitted to a nice box. So, the basic element in a chart is a rectangle, inside of this it can hold a data widget that displays a table or graphic. Being able to tile together boxes is already standard practice for GUI frameworks, so this observation isn’t novel. For supporting the type of plot that I wanted though, the ability to support a stack of layers would be nice. With this feature, assembling the Tufte bar chart would consist of first placing the bars, and on top of that, a white grid. Though I consider this a bit of a visual hack, I do think it can be a design principle within the plotting framework.
So, what can you do with widget boxes and layers? You can build a supporting library of elements: bar charts, axes, tables, scatter plots, etc. Making a complete chart would consist of assembling the elements you want into a tile, and then layering the tiles on top of each other. To make my plot, you would first grab a table widget, and hand it the names and numbers. This could be assembled to the left of a bar chart, with the list of ratios. Since both the table and the bar chart have the same length of data, the bars and table columns would automatically line up. Then the horizontal axis would be placed under the bar chart element. Finally, to give Tufte’s ghost grid, a grid widget of vertical white stripes can be layered on top of the bar chart. The design models the assembly done for any dialog box made with a GUI framework.
Ultimately, I wasn’t able to make the charts the way I wanted, but my labmate was! He did spend a whole day to make 3 charts, and it took 4 hours to do the first one. This was still infinitely more productive than my frustrated attempt. Yet, I bemoan the fact that there does not exist a nice framework for charting. It should not take more than 20 minutes to set up a pretty plot. (Just for reference, SunSpider benchmarks can finish in less than 5 minutes, and it should never take longer to create a display than it does to gather the data). In my opinion, separating each type of data representation, and then layering the types is a much better approach than Office’s approach of selecting from a fixed number of pre-determined styles.
Now, if only I could find a decent alternative to the multi-bar chart.