I think there’s an unexplained leap here from the activity of trying to measure and compare things, to the assumption that you need a single metric.
It seems pretty reasonable, when assessing these options, to look at things like how many people (or other things you care about) would be directly affected by the project, what kind of effect it would have on them, and how big your piece of the project is. But given the very different nature of some of these projects, they’ll end up in different units. You can then try to figure out which thing seems most appealing to you. But I would expect that you’ll more often get a result you’d reflectively endorse, if you directly compared the different concrete outcomes and asked which seems better to you (e.g. “would I rather prevent twenty hogs from leading lives of pointless suffering, or help one person immigrate from a stable but somewhat poor place to a modern city?), than if you converted them to some sort of abstract utility metric like QALYs and then compared the numbers. The concrete comparisons, by not compressing out the moral complexity of the tradeoffs, allow you extra opportunity to notice features of the decision you might otherwise miss, and get curious about details that might even help you generate improved options.
Another thing you miss if you try to collapse everything into a single metric, is the opportunity to simplify the decision in more reliable ways through causally modeling the relationship between different intermediate outcomes. For instance, some amount of gardening might affect your own well-being, such that spending a small amount of your time on a small garden might actually improve your ability to do other work—in that case, that choice is overdetermined. On the other hand, working on a charter city might affect how many other people get to make a nice garden, and depending on how much you care about the experience happening to you rather than other people, you might end up deciding that past some point of diminishing returns on productivity, your production of gardening-related experiences is more efficiently pursued by empowering others than by making a garden.
This kind of thinking can also help generate surprising opportunities you hadn’t contemplated—the kind of work that goes into a garden-friendly charter city might help with other agendas like robustness to food supply disruption, or carbon capture. This is a bit of a toy example, but I hope you see the general point. Curiosity about the concrete details of different plans can be extremely valuable, comparing plans is a great opportunity to generate such curiosity, and totalizing metrics compress out all those opportunities.
I guess I’d first like to disagree with the implication that using a single metric implies collapsing everything into a single metric, without getting curious about details and causal chains. The latter seems bad, for the reasons that you’ve mentioned, but I think there are reasons to like the former. Those reasons:
Many comparisons have a large number of different features. Choosing a single metric that’s a function of only some features can make the comparison simpler by stopping you from considering features that you consider irrelevant, and inducing you to focus on features that are important for your decision (e.g. “gardening looks strictly better than charter cities because it makes me more productive, and that’s the important thing in my metric—can I check if that’s actually true, or quantify that?”).
Many comparisons have a large number of varying features. If you think that by default you have biases or, more generally, unendorsed subroutines that cause you to focus on features you shouldn’t, it can be useful to think about them when constructing a metric, and then using the metric in a way that ‘crowds out’ relevant biases (e.g. you might tie yourself to using QALYs if you’re worried that by default you’ll tend to favour interventions that help people of your own ethnicity more than you would consciously endorse). See Hanson’s recent discussion of simple rules vs the use of discretion.
By having your metric be a function of a comparatively small number of features, you give yourself the ability to search the space of things you could possibly do by how those things stack up against those features, focussing the options you consider on things that you’re more likely to endorse (e.g. “hmm, if I wanted to maximise QALYs, what jobs would I want to take that I’m not currently considering?” or “hmm, if I wanted to maximise QALYs, what systems in the world would I be interested in affecting, and what instrumental goals would I want to pursue?”). I don’t see how to do this without, if not a single metric, then a small number of metrics.
Metrics can crystallise tradeoffs. If I’m regularly thinking about different interventions that affect the lives of different farmed animals, then after making several decisions, it’s probably computationally easier for me to come up with a rule for how I tend to trade off cow effects vs sheep effects, and/or freedom effects vs pain reduction effects, then to make that tradeoff every time independently.
Metrics help with legibility. This is less important in the case of an individual choosing career options to take, but suppose that I want to be GiveWell, and recommend charities I think are high-value, or I want to let other people who I don’t know very well invest in my career. In that case, it’s useful to have a legible metric that explains what decisions I’m making, so that other people can predict my future actions better, and so that they can clearly see reasons for why they should support me.
I don’t mean to make a fully general argument against ever using metrics here. I don’t think that using a metric is always a bad thing, especially in some of the more scope-limited examples you give. I do think that some of the examples you give make some unreasonable assumptions, though—let’s go case by case.
Choosing a single metric that’s a function of only some features can make the comparison simpler by stopping you from considering features that you consider irrelevant, and inducing you to focus on features that are important for your decision (e.g. “gardening looks strictly better than charter cities because it makes me more productive, and that’s the important thing in my metric—can I check if that’s actually true, or quantify that?”).
I think most of the work here is figuring out which attributes you care about, but I agree that in many cases the best way to make comparisons on those attributes will be via explicitly quantified metrics.
If you think that by default you have biases or, more generally, unendorsed subroutines that cause you to focus on features you shouldn’t, it can be useful to think about them when constructing a metric, and then using the metric in a way that ‘crowds out’ relevant biases (e.g. you might tie yourself to using QALYs if you’re worried that by default you’ll tend to favour interventions that help people of your own ethnicity more than you would consciously endorse).
I think that a lot of worries about this kind of “bias” are only coherent in fiduciary situations; Hanson’s examples are all about such situations, where we’re trying to constrain the behavior of people to whom we’ve delegated trust. If you’re deciding on your own account, then it can often be much better to resolve the conflict, or at least clarify what it’s about, rather than trying to behave as though you’re more accountable to others than you are.
By having your metric be a function of a comparatively small number of features, you give yourself the ability to search the space of things you could possibly do by how those things stack up against those features, focussing the options you consider on things that you’re more likely to endorse (e.g. “hmm, if I wanted to maximise QALYs, what jobs would I want to take that I’m not currently considering?” or “hmm, if I wanted to maximise QALYs, what systems in the world would I be interested in affecting, and what instrumental goals would I want to pursue?”). I don’t see how to do this without, if not a single metric, then a small number of metrics.
This seems completely unobjectionable—simple metrics (I’d use something much simpler than QALYs) can be fantastic for hypothesis generation, and it’s a bad sign if someone claiming to care about scope doesn’t seem to have done this.
Metrics can crystallise tradeoffs. If I’m regularly thinking about different interventions that affect the lives of different farmed animals, then after making several decisions, it’s probably computationally easier for me to come up with a rule for how I tend to trade off cow effects vs sheep effects, and/or freedom effects vs pain reduction effects, then to make that tradeoff every time independently.
Metrics help with legibility. This is less important in the case of an individual choosing career options to take, but suppose that I want to be GiveWell, and recommend charities I think are high-value, or I want to let other people who I don’t know very well invest in my career. In that case, it’s useful to have a legible metric that explains what decisions I’m making, so that other people can predict my future actions better, and so that they can clearly see reasons for why they should support me.
This sounds great in principle, but my sense is that in practice almost no one actually does this based on a utilitarian metric because it would be ruinously computationally expensive. The only legible unified impact metrics GiveWell tracks are money moved and website traffic, for instance. Sometimes people pretend to use such metrics as targets to determine their actions, but the benefits of pretending to do a thing are quite different from the benefits of doing the thing.
I think there’s an unexplained leap here from the activity of trying to measure and compare things, to the assumption that you need a single metric.
It seems pretty reasonable, when assessing these options, to look at things like how many people (or other things you care about) would be directly affected by the project, what kind of effect it would have on them, and how big your piece of the project is. But given the very different nature of some of these projects, they’ll end up in different units. You can then try to figure out which thing seems most appealing to you. But I would expect that you’ll more often get a result you’d reflectively endorse, if you directly compared the different concrete outcomes and asked which seems better to you (e.g. “would I rather prevent twenty hogs from leading lives of pointless suffering, or help one person immigrate from a stable but somewhat poor place to a modern city?), than if you converted them to some sort of abstract utility metric like QALYs and then compared the numbers. The concrete comparisons, by not compressing out the moral complexity of the tradeoffs, allow you extra opportunity to notice features of the decision you might otherwise miss, and get curious about details that might even help you generate improved options.
Another thing you miss if you try to collapse everything into a single metric, is the opportunity to simplify the decision in more reliable ways through causally modeling the relationship between different intermediate outcomes. For instance, some amount of gardening might affect your own well-being, such that spending a small amount of your time on a small garden might actually improve your ability to do other work—in that case, that choice is overdetermined. On the other hand, working on a charter city might affect how many other people get to make a nice garden, and depending on how much you care about the experience happening to you rather than other people, you might end up deciding that past some point of diminishing returns on productivity, your production of gardening-related experiences is more efficiently pursued by empowering others than by making a garden.
This kind of thinking can also help generate surprising opportunities you hadn’t contemplated—the kind of work that goes into a garden-friendly charter city might help with other agendas like robustness to food supply disruption, or carbon capture. This is a bit of a toy example, but I hope you see the general point. Curiosity about the concrete details of different plans can be extremely valuable, comparing plans is a great opportunity to generate such curiosity, and totalizing metrics compress out all those opportunities.
I guess I’d first like to disagree with the implication that using a single metric implies collapsing everything into a single metric, without getting curious about details and causal chains. The latter seems bad, for the reasons that you’ve mentioned, but I think there are reasons to like the former. Those reasons:
Many comparisons have a large number of different features. Choosing a single metric that’s a function of only some features can make the comparison simpler by stopping you from considering features that you consider irrelevant, and inducing you to focus on features that are important for your decision (e.g. “gardening looks strictly better than charter cities because it makes me more productive, and that’s the important thing in my metric—can I check if that’s actually true, or quantify that?”).
Many comparisons have a large number of varying features. If you think that by default you have biases or, more generally, unendorsed subroutines that cause you to focus on features you shouldn’t, it can be useful to think about them when constructing a metric, and then using the metric in a way that ‘crowds out’ relevant biases (e.g. you might tie yourself to using QALYs if you’re worried that by default you’ll tend to favour interventions that help people of your own ethnicity more than you would consciously endorse). See Hanson’s recent discussion of simple rules vs the use of discretion.
By having your metric be a function of a comparatively small number of features, you give yourself the ability to search the space of things you could possibly do by how those things stack up against those features, focussing the options you consider on things that you’re more likely to endorse (e.g. “hmm, if I wanted to maximise QALYs, what jobs would I want to take that I’m not currently considering?” or “hmm, if I wanted to maximise QALYs, what systems in the world would I be interested in affecting, and what instrumental goals would I want to pursue?”). I don’t see how to do this without, if not a single metric, then a small number of metrics.
Metrics can crystallise tradeoffs. If I’m regularly thinking about different interventions that affect the lives of different farmed animals, then after making several decisions, it’s probably computationally easier for me to come up with a rule for how I tend to trade off cow effects vs sheep effects, and/or freedom effects vs pain reduction effects, then to make that tradeoff every time independently.
Metrics help with legibility. This is less important in the case of an individual choosing career options to take, but suppose that I want to be GiveWell, and recommend charities I think are high-value, or I want to let other people who I don’t know very well invest in my career. In that case, it’s useful to have a legible metric that explains what decisions I’m making, so that other people can predict my future actions better, and so that they can clearly see reasons for why they should support me.
I don’t mean to make a fully general argument against ever using metrics here. I don’t think that using a metric is always a bad thing, especially in some of the more scope-limited examples you give. I do think that some of the examples you give make some unreasonable assumptions, though—let’s go case by case.
I think most of the work here is figuring out which attributes you care about, but I agree that in many cases the best way to make comparisons on those attributes will be via explicitly quantified metrics.
I think that a lot of worries about this kind of “bias” are only coherent in fiduciary situations; Hanson’s examples are all about such situations, where we’re trying to constrain the behavior of people to whom we’ve delegated trust. If you’re deciding on your own account, then it can often be much better to resolve the conflict, or at least clarify what it’s about, rather than trying to behave as though you’re more accountable to others than you are.
This seems completely unobjectionable—simple metrics (I’d use something much simpler than QALYs) can be fantastic for hypothesis generation, and it’s a bad sign if someone claiming to care about scope doesn’t seem to have done this.
This sounds great in principle, but my sense is that in practice almost no one actually does this based on a utilitarian metric because it would be ruinously computationally expensive. The only legible unified impact metrics GiveWell tracks are money moved and website traffic, for instance. Sometimes people pretend to use such metrics as targets to determine their actions, but the benefits of pretending to do a thing are quite different from the benefits of doing the thing.