Level up your spreadsheeting

Link post

Epistemic status: Passion project /​ domain I’m pretty opinionated about, just for fun.

In this post, I walk through some principles I think good spreadsheets abide by, and then in the companion piece, I walk through a whole bunch of tricks I’ve found valuable.

Image of a spreadsheet by GPT-4o
Illustrated by GPT-4o

Who am I?

I’ve spent a big chunk of my (short) professional career so far getting good at Excel and Google Sheets.[1] As such, I’ve accumulated a bunch of opinions on this topic.

Who should read this?

This is not a guide to learning how to start using spreadsheets at all. I think you will get more out of this post if you use spreadsheets at least somewhat frequently, e.g.

  • Have made 20+ spreadsheets

  • Know how to use basic formulas like sum, if, countif, round

  • Know some fancier formulas like left/​mid/​right, concatenate, hyperlink

  • Have used some things like filters, conditional formatting, data validation

Principles of good spreadsheets

Broadly speaking, I think good spreadsheets follow some core principles (non-exhaustive list).

I think the below is a combination of good data visualization (or just communication) advice, systems design, and programming design (spreadsheets combine the code and the output).

It should be easy for you to extract insights from your data

  1. A core goal you might have with spreadsheets is quickly calculating something based on your data. A bunch of tools below are aimed at improving functionality, allowing you to more quickly grab the data you want.

Your spreadsheet should be beautiful and easy to read

  1. Sometimes, spreadsheets look like the following example.

  2. I claim that this is not beautiful or easy for your users to follow what is going on. I think there are cheap techniques you can use to improve the readability of your data.

There should be one source of truth for your data

  1. One common pitfall when designing spreadsheet-based trackers is hard copy and pasting data from one sheet to another, such that when your source data changes, the sheets you use for analyses no longer reflect “fresh” data. This is a big way in which your spreadsheet systems can break down.

  2. A bunch of tools below are designed to improve data portability — i.e. remove the need for copy and pasting.

Your spreadsheet should be easy to audit

  1. One major downside of spreadsheets as compared to most coding languages, is that it’s often easy for relatively simple spreadsheets to contain silent bugs in them.[2]

  2. Some features of spreadsheets that contribute to this problem:

    1. Spreadsheets hide the code and show you only the output by default.

      1. When you use formulas, once you hit enter, the user doesn’t by default get to read what’s going on. So if the output looks plausible, you might not notice your formula has a bug in it.

    2. It’s harder to break up your work into chunks.

      1. When you’re coding, most people will break up a complicated formula into several lines of code, using intermediate variables and comments to make things more readable. E.g.:

      2. By default, some Sheets formulas get really unwieldy, and you need to work a bit harder to recover readability.

    3. Spreadsheets contain more individual calculations.

      1. When you’re coding and you want to perform the same calculation on 100 rows of data, you’d probably use a single line of code to iterate over your data (e.g. a for loop).

      2. In Google Sheets, you’re more likely to drag your formula down across all of your rows. But this means that if you accidentally change the formula for one cell and not the others, or if your data has now changed and it turns out you need to drag your formulas down more, things can break in annoying ways.

  3. Because of this, I consider auditability one of the key qualities of a well designed spreadsheet. Some of the tools below will recover coding best practices.

  4. I also consider principles (2)-(3) above pretty related to principle (4).

Your spreadsheet should be hard to break

  1. Not all spreadsheets are meant as living documents; sometimes you’ll create a spreadsheet to conduct a specific analysis and then discard it.

  2. But sometimes, you’ll use a spreadsheet as a management tool to keep track of a bunch of moving pieces. In this case, you might care that your system isn’t going to break after a few weeks of use.[3]


Much more in the companion piece!

  1. ^

    I’m using the term ‘Google Sheets’ in this doc, but almost all of the tricks mentioned here work for Excel as well.

  2. ^

    My favorite Excel bug story: I used to work in litigation consulting, where I’d sometimes audit spreadsheets sent to us from the opposing side of a legal case.

    In one case, an expert witness for the opposing side sent over a spreadsheet with columns similar to the following: year, online sales, in-person sales, total sales. The expert was saying that total sales had almost doubled from ~3,000 → ~5,000 for this particular product in 2019.

    We eventually discovered that for the 2019 row, the expert had entered the formula =sum(A4:C4) instead of sum(B4:C4), and so had accidentally added the value ‘2019’ to the total sum. Here’s a recreation. (I’ve obfuscated the details a bit here but the core mistake was the same.)

  3. ^

    As an aside, spreadsheets have a lot of use cases, which makes giving generalizable advice a bit trickier. For instance, some common use cases for spreadsheets:

    - A database which you query whenever needed;

    - A data visualization tool meant to present some interesting findings from existing data;

    - A management tracker that you use to schedule emails and keep tabs on your tasks;

    - To model some interesting phenomenon and keep track of your assumptions

    Depending on what you’re using a spreadsheet for, you might prioritize some of these principles more or less highly. For instance, making something easy to read is probably more valuable when you’re creating a data visualization versus a database.

    Of course, lots of spreadsheets combine lots of different use cases — e.g. you might have one tab with your source of truth data, and another for random analytics.