There’s an analogy with the notion of “toil” which is popular in the Site Reliability Engineering subfield of software engineering. Toil in this context is work which is necessary to keep the lights on, but which doesn’t actually improve anything. In some sense, the job of an SRE is to reduce toil; they must certainly be psychologically able to deal with it, because it’s the stuff with which they work! I’ll just talk a bit about it here in a fairly undirected way, in case any of it gives you ideas. The SRE Handbook is well worth reading if you’re a software engineer, by the way.
The SRE’s (imperfectly-aligned-to-your-problem) answer to the problem “we’re being buried in toil” is to track the proportion of time spent on toil versus “productive” work. If the toil becomes greater than some proportion, the response is to divert resources from feature work towards reducing the toil (e.g. by automating it, or addressing the root causes of the issues that you’re spending time fighting). An extremely simple example of such automation is setting up direct debits to pay bills, or repeat online orders for groceries. An SRE performing any particular piece of toil would at least spend a moment to think about whether it could be automated instead.
Runbooks (lists of triggers and responses to guide you through operations) are a standard SRE-style tool for making the toil less error-prone and stressful. To know when you should be performing some piece of toil, it’s standard to identify and set up alerts, so that you have a specific trigger. (“I just got a Slack alert saying that the database has reached 70% capacity; the alert pointed me to this wiki page telling me step-by-step how to bring the database offline safely and perform a vacuum to release space”, or “my washing basket is 3⁄4 full; that means this evening I will be putting on a load of laundry”.)
It’s also standard to batch up the toil. A team of people will usually have a rota, so that any given person’s time is mostly spent doing productive work, and the toil is the responsibility of the people on duty. That way, you only get a small amount of relative hell before you rotate onto better work. The toil necessary to maintain a human life is generally not that urgent and is hence very amenable to batching, except for the most basic biological things like using the toilet or putting food into your mouth (note: preparing food is not an urgent biological need unless your planning procedures have failed!). You can batch up a lot of it: e.g. you spend half of Saturday preparing meals for the week, or otherwise arranging so that the daily time spent preparing and putting food into your mouth is as low as possible, and you can declare that one Sunday every two months is paperwork.
I love hearing about the overlap between personal life stuff and concepts from technical fields. So thanks a lot for this comment! Another instance of this topic is the advice from the book Algorithms to Live By.
Also, I’d love to hear more about this stuff. Furthermore, I think the general theme of “general-purpose advice based on what I learned in my work” is very fruitful for blog posts, including LW posts.
There’s an analogy with the notion of “toil” which is popular in the Site Reliability Engineering subfield of software engineering. Toil in this context is work which is necessary to keep the lights on, but which doesn’t actually improve anything. In some sense, the job of an SRE is to reduce toil; they must certainly be psychologically able to deal with it, because it’s the stuff with which they work! I’ll just talk a bit about it here in a fairly undirected way, in case any of it gives you ideas. The SRE Handbook is well worth reading if you’re a software engineer, by the way.
The SRE’s (imperfectly-aligned-to-your-problem) answer to the problem “we’re being buried in toil” is to track the proportion of time spent on toil versus “productive” work. If the toil becomes greater than some proportion, the response is to divert resources from feature work towards reducing the toil (e.g. by automating it, or addressing the root causes of the issues that you’re spending time fighting). An extremely simple example of such automation is setting up direct debits to pay bills, or repeat online orders for groceries. An SRE performing any particular piece of toil would at least spend a moment to think about whether it could be automated instead.
Runbooks (lists of triggers and responses to guide you through operations) are a standard SRE-style tool for making the toil less error-prone and stressful. To know when you should be performing some piece of toil, it’s standard to identify and set up alerts, so that you have a specific trigger. (“I just got a Slack alert saying that the database has reached 70% capacity; the alert pointed me to this wiki page telling me step-by-step how to bring the database offline safely and perform a vacuum to release space”, or “my washing basket is 3⁄4 full; that means this evening I will be putting on a load of laundry”.)
It’s also standard to batch up the toil. A team of people will usually have a rota, so that any given person’s time is mostly spent doing productive work, and the toil is the responsibility of the people on duty. That way, you only get a small amount of relative hell before you rotate onto better work. The toil necessary to maintain a human life is generally not that urgent and is hence very amenable to batching, except for the most basic biological things like using the toilet or putting food into your mouth (note: preparing food is not an urgent biological need unless your planning procedures have failed!). You can batch up a lot of it: e.g. you spend half of Saturday preparing meals for the week, or otherwise arranging so that the daily time spent preparing and putting food into your mouth is as low as possible, and you can declare that one Sunday every two months is paperwork.
I love hearing about the overlap between personal life stuff and concepts from technical fields. So thanks a lot for this comment! Another instance of this topic is the advice from the book Algorithms to Live By.
Also, I’d love to hear more about this stuff. Furthermore, I think the general theme of “general-purpose advice based on what I learned in my work” is very fruitful for blog posts, including LW posts.