I guess this is fine, but I’m not convinced. This mostly just seems like you pushing your personal aesthetic preferences, but it feels like I could easily come up with arguments for following exactly the opposite advice.
This post reminds me of lots of writing advice: seems fine, so long as you have the same aesthetic sensibilities as the person giving the advice.
I was wondering about this too. I thought of Eugene Wei writing about Edward Tufte’s classic book The Visual Display of Quantitative Information, which he considers “[one of] the most important books I’ve read”. He illustrates with an example, just like dynomight did above, starting with this chart auto-created in Excel:
and systematically applies Tufte’s principles to eventually end up with this:
Wei adds further commentary:
No issues for color blind users, but we’re stretching the limits of line styles past where I’m comfortable. To me, it’s somewhat easier with the colored lines above to trace different countries across time versus each other, though this monochrome version isn’t terrible. Still, this chart reminds me, in many ways, of the monochromatic look of my old Amazon Analytics Package, though it is missing data labels (wouldn’t fit here) and has horizontal gridlines (mine never did).
We’re running into some of these tradeoffs because of the sheer number of data series in play. Eight is not just enough, it is probably too many. Past some number of data series, it’s often easier and cleaner to display these as a series of small multiples. It all depends on the goal and what you’re trying to communicate.
At some point, no set of principles is one size fits all, and as the communicator you have to make some subjective judgments. For example, at Amazon, I knew that Joy wanted to see the data values marked on the graph, whenever they could be displayed. She was that detail-oriented. Once I included data values, gridlines were repetitive, and y-axis labels could be reduced in number as well.
Tufte advocates reducing non-data-ink, within reason, and gridlines are often just that. In some cases, if data values aren’t possible to fit onto a line graph, I sometimes include gridlines to allow for easy calculation of the relative ratio of one value to another (simply count gridlines between the values), but that’s an edge case.
For sharp changes, like an anomalous reversal in the slope of a line graph, I often inserted a note directly on the graph, to anticipate and head off any viewer questions. For example, in the graph above, if fewer data series were included, but Greece remained, one might wish to explain the decline in health expenditures starting in 2008 by adding a note in the plot area near that data point, noting the beginning of the Greek financial crisis (I don’t know if that’s the actual cause, but whatever the reason or theory, I’d place it there).
If we had company targets for a specific metric, I’d note those on the chart(s) in question as a labeled asymptote. You can never remind people of goals often enough.
And I thought, okay, sounds persuasive and all, but also this feels like Wei/Tufte is pushing their personal aesthetic on me, and I can’t really tell the difference (or whether it matters).
One way you could measure which one is “best” would be to measure how long it takes people to answer certain questions. E.g. “For what fraction of the 1997-2010 period did Japan spend more on healthcare per-capita than the UK?” or “what’s the average ratio of healthcare spending in Sweden vs. Greece between 2000 and 2010?” (I think there is an academic literature on these kinds of experiments, though I don’t have any references on hand.)
In this case, I think Tufte goes overboard in saying you shouldn’t use color. But if the second plot had color, I’d venture it would win most such contests, if only because the y-axis is bigger and it’s easier to match the lines with the labels. But even if I don’t agree with everything Tufte says, I still find him useful because he suggests different options and different ways to think about things.
Yeah, agreed that getting people to answer questions using the chart, and measuring their speed and accuracy is the key objective metric of design quality.
Also, I like it when both color and line styles are used together. Keeps it clear for colorblind people, and makes it extra clear for colorsighted people.
Choosing colors should be done carefully to balance contrast with the background color. And can be done in such a way as to be visible even to the most common colorblindness types.
Hey, you might be right! I’ll take this as useful feedback that the argument wasn’t fully convincing. Don’t mean to pull a motte-and-bailey, but I suppose if I had to, I’d retreat to an argument like, “if making a plot, consider using these rules as one option for how to pick axes.” In any case, if you have any examples where you think following this advice leads to bad choices, I’d be interested to hear them.
When I looked at your proposed GDP-Time chart, I felt I was more inclined to treat the year as quantitative and the amounts as categorical. Though I don’t know how that would actually play out if I were trying to make use of it in anger.
Agreed (except about the “this is fine” part). The arguments are unconvincing and the recommendations seem bad. (In particular, the suggestion that the “vary between $50T and $53T” graph shouldn’t be drawn with a zero-based y-axis is egregious.)
If I measure gravitational force against altitude, and end up with points like the following:
0 ft above sea level, force is 9.8000 m/s2
1000 ft above sea level, force is 9.7992 m/s2
2000 ft above sea level, force is 9.7986 m/s2
3000 ft above sea level, force is 9.7980 m/s2
would it be egregious for me to plot this graph without a zero-based y-axis? Do I need to plot it with a y-axis going down to zero?
Certainly there are cases where it’s misleading to not extend a graph like this down to zero. But there are also cases where it’s entirely reasonable to not extend it down to zero.
I’m surprised to hear you say that. I would consider it perfectly reasonable to use a line graph without a zero-based y-axis to plot gravity against altitude: the underlying reality is in fact a line (well, a curve I guess)! Gravitational force goes down with altitude in a known way! But the effects of altitude on gravity are very small for altitudes we can easily measure, and extending the graph all the way down to zero will make it impossible to see them.
Eh, I’ve encountered plenty of times when I really needed to understand the variance of data such that I had to “zoom in” and put the start of the axis at something above 0 because otherwise I couldn’t find out what I needed to know to make a decision. But I do often like to see it both ways, so I can understand it both in relative and absolute terms.
Just so; the correct way is indeed to show the full (zero-based y-axis) chart, then a “zoomed-in” version, with the y-axis mapping clearly indicated. Of course, this takes more effort than just including the one chart; but this is not surprising—doing things correctly often takes more effort than doing things incorrectly!
I guess this is fine, but I’m not convinced. This mostly just seems like you pushing your personal aesthetic preferences, but it feels like I could easily come up with arguments for following exactly the opposite advice.
This post reminds me of lots of writing advice: seems fine, so long as you have the same aesthetic sensibilities as the person giving the advice.
I was wondering about this too. I thought of Eugene Wei writing about Edward Tufte’s classic book The Visual Display of Quantitative Information, which he considers “[one of] the most important books I’ve read”. He illustrates with an example, just like dynomight did above, starting with this chart auto-created in Excel:
and systematically applies Tufte’s principles to eventually end up with this:Wei adds further commentary:
And I thought, okay, sounds persuasive and all, but also this feels like Wei/Tufte is pushing their personal aesthetic on me, and I can’t really tell the difference (or whether it matters).
One way you could measure which one is “best” would be to measure how long it takes people to answer certain questions. E.g. “For what fraction of the 1997-2010 period did Japan spend more on healthcare per-capita than the UK?” or “what’s the average ratio of healthcare spending in Sweden vs. Greece between 2000 and 2010?” (I think there is an academic literature on these kinds of experiments, though I don’t have any references on hand.)
In this case, I think Tufte goes overboard in saying you shouldn’t use color. But if the second plot had color, I’d venture it would win most such contests, if only because the y-axis is bigger and it’s easier to match the lines with the labels. But even if I don’t agree with everything Tufte says, I still find him useful because he suggests different options and different ways to think about things.
Yeah, agreed that getting people to answer questions using the chart, and measuring their speed and accuracy is the key objective metric of design quality. Also, I like it when both color and line styles are used together. Keeps it clear for colorblind people, and makes it extra clear for colorsighted people. Choosing colors should be done carefully to balance contrast with the background color. And can be done in such a way as to be visible even to the most common colorblindness types.
Yes! But not just time, you should also compare them on accuracy.
Hey, you might be right! I’ll take this as useful feedback that the argument wasn’t fully convincing. Don’t mean to pull a motte-and-bailey, but I suppose if I had to, I’d retreat to an argument like, “if making a plot, consider using these rules as one option for how to pick axes.” In any case, if you have any examples where you think following this advice leads to bad choices, I’d be interested to hear them.
When I looked at your proposed GDP-Time chart, I felt I was more inclined to treat the year as quantitative and the amounts as categorical. Though I don’t know how that would actually play out if I were trying to make use of it in anger.
Agreed (except about the “this is fine” part). The arguments are unconvincing and the recommendations seem bad. (In particular, the suggestion that the “vary between $50T and $53T” graph shouldn’t be drawn with a zero-based y-axis is egregious.)
If I measure gravitational force against altitude, and end up with points like the following:
0 ft above sea level, force is 9.8000 m/s2
1000 ft above sea level, force is 9.7992 m/s2
2000 ft above sea level, force is 9.7986 m/s2
3000 ft above sea level, force is 9.7980 m/s2
would it be egregious for me to plot this graph without a zero-based y-axis? Do I need to plot it with a y-axis going down to zero?
Certainly there are cases where it’s misleading to not extend a graph like this down to zero. But there are also cases where it’s entirely reasonable to not extend it down to zero.
Would you graph with a line chart? No. And it absolutely would be egregious to use a line chart and then not use a zero-based y-axis.
I’m surprised to hear you say that. I would consider it perfectly reasonable to use a line graph without a zero-based y-axis to plot gravity against altitude: the underlying reality is in fact a line (well, a curve I guess)! Gravitational force goes down with altitude in a known way! But the effects of altitude on gravity are very small for altitudes we can easily measure, and extending the graph all the way down to zero will make it impossible to see them.
Here, I’d plot difference from gravitation at sea level.
Eh, I’ve encountered plenty of times when I really needed to understand the variance of data such that I had to “zoom in” and put the start of the axis at something above 0 because otherwise I couldn’t find out what I needed to know to make a decision. But I do often like to see it both ways, so I can understand it both in relative and absolute terms.
Just so; the correct way is indeed to show the full (zero-based y-axis) chart, then a “zoomed-in” version, with the y-axis mapping clearly indicated. Of course, this takes more effort than just including the one chart; but this is not surprising—doing things correctly often takes more effort than doing things incorrectly!