I guess I’d first like to disagree with the implication that using a single metric implies collapsing everything into a single metric, without getting curious about details and causal chains. The latter seems bad, for the reasons that you’ve mentioned, but I think there are reasons to like the former. Those reasons:
Many comparisons have a large number of different features. Choosing a single metric that’s a function of only some features can make the comparison simpler by stopping you from considering features that you consider irrelevant, and inducing you to focus on features that are important for your decision (e.g. “gardening looks strictly better than charter cities because it makes me more productive, and that’s the important thing in my metric—can I check if that’s actually true, or quantify that?”).
Many comparisons have a large number of varying features. If you think that by default you have biases or, more generally, unendorsed subroutines that cause you to focus on features you shouldn’t, it can be useful to think about them when constructing a metric, and then using the metric in a way that ‘crowds out’ relevant biases (e.g. you might tie yourself to using QALYs if you’re worried that by default you’ll tend to favour interventions that help people of your own ethnicity more than you would consciously endorse). See Hanson’s recent discussion of simple rules vs the use of discretion.
By having your metric be a function of a comparatively small number of features, you give yourself the ability to search the space of things you could possibly do by how those things stack up against those features, focussing the options you consider on things that you’re more likely to endorse (e.g. “hmm, if I wanted to maximise QALYs, what jobs would I want to take that I’m not currently considering?” or “hmm, if I wanted to maximise QALYs, what systems in the world would I be interested in affecting, and what instrumental goals would I want to pursue?”). I don’t see how to do this without, if not a single metric, then a small number of metrics.
Metrics can crystallise tradeoffs. If I’m regularly thinking about different interventions that affect the lives of different farmed animals, then after making several decisions, it’s probably computationally easier for me to come up with a rule for how I tend to trade off cow effects vs sheep effects, and/or freedom effects vs pain reduction effects, then to make that tradeoff every time independently.
Metrics help with legibility. This is less important in the case of an individual choosing career options to take, but suppose that I want to be GiveWell, and recommend charities I think are high-value, or I want to let other people who I don’t know very well invest in my career. In that case, it’s useful to have a legible metric that explains what decisions I’m making, so that other people can predict my future actions better, and so that they can clearly see reasons for why they should support me.
I don’t mean to make a fully general argument against ever using metrics here. I don’t think that using a metric is always a bad thing, especially in some of the more scope-limited examples you give. I do think that some of the examples you give make some unreasonable assumptions, though—let’s go case by case.
Choosing a single metric that’s a function of only some features can make the comparison simpler by stopping you from considering features that you consider irrelevant, and inducing you to focus on features that are important for your decision (e.g. “gardening looks strictly better than charter cities because it makes me more productive, and that’s the important thing in my metric—can I check if that’s actually true, or quantify that?”).
I think most of the work here is figuring out which attributes you care about, but I agree that in many cases the best way to make comparisons on those attributes will be via explicitly quantified metrics.
If you think that by default you have biases or, more generally, unendorsed subroutines that cause you to focus on features you shouldn’t, it can be useful to think about them when constructing a metric, and then using the metric in a way that ‘crowds out’ relevant biases (e.g. you might tie yourself to using QALYs if you’re worried that by default you’ll tend to favour interventions that help people of your own ethnicity more than you would consciously endorse).
I think that a lot of worries about this kind of “bias” are only coherent in fiduciary situations; Hanson’s examples are all about such situations, where we’re trying to constrain the behavior of people to whom we’ve delegated trust. If you’re deciding on your own account, then it can often be much better to resolve the conflict, or at least clarify what it’s about, rather than trying to behave as though you’re more accountable to others than you are.
By having your metric be a function of a comparatively small number of features, you give yourself the ability to search the space of things you could possibly do by how those things stack up against those features, focussing the options you consider on things that you’re more likely to endorse (e.g. “hmm, if I wanted to maximise QALYs, what jobs would I want to take that I’m not currently considering?” or “hmm, if I wanted to maximise QALYs, what systems in the world would I be interested in affecting, and what instrumental goals would I want to pursue?”). I don’t see how to do this without, if not a single metric, then a small number of metrics.
This seems completely unobjectionable—simple metrics (I’d use something much simpler than QALYs) can be fantastic for hypothesis generation, and it’s a bad sign if someone claiming to care about scope doesn’t seem to have done this.
Metrics can crystallise tradeoffs. If I’m regularly thinking about different interventions that affect the lives of different farmed animals, then after making several decisions, it’s probably computationally easier for me to come up with a rule for how I tend to trade off cow effects vs sheep effects, and/or freedom effects vs pain reduction effects, then to make that tradeoff every time independently.
Metrics help with legibility. This is less important in the case of an individual choosing career options to take, but suppose that I want to be GiveWell, and recommend charities I think are high-value, or I want to let other people who I don’t know very well invest in my career. In that case, it’s useful to have a legible metric that explains what decisions I’m making, so that other people can predict my future actions better, and so that they can clearly see reasons for why they should support me.
This sounds great in principle, but my sense is that in practice almost no one actually does this based on a utilitarian metric because it would be ruinously computationally expensive. The only legible unified impact metrics GiveWell tracks are money moved and website traffic, for instance. Sometimes people pretend to use such metrics as targets to determine their actions, but the benefits of pretending to do a thing are quite different from the benefits of doing the thing.
I guess I’d first like to disagree with the implication that using a single metric implies collapsing everything into a single metric, without getting curious about details and causal chains. The latter seems bad, for the reasons that you’ve mentioned, but I think there are reasons to like the former. Those reasons:
Many comparisons have a large number of different features. Choosing a single metric that’s a function of only some features can make the comparison simpler by stopping you from considering features that you consider irrelevant, and inducing you to focus on features that are important for your decision (e.g. “gardening looks strictly better than charter cities because it makes me more productive, and that’s the important thing in my metric—can I check if that’s actually true, or quantify that?”).
Many comparisons have a large number of varying features. If you think that by default you have biases or, more generally, unendorsed subroutines that cause you to focus on features you shouldn’t, it can be useful to think about them when constructing a metric, and then using the metric in a way that ‘crowds out’ relevant biases (e.g. you might tie yourself to using QALYs if you’re worried that by default you’ll tend to favour interventions that help people of your own ethnicity more than you would consciously endorse). See Hanson’s recent discussion of simple rules vs the use of discretion.
By having your metric be a function of a comparatively small number of features, you give yourself the ability to search the space of things you could possibly do by how those things stack up against those features, focussing the options you consider on things that you’re more likely to endorse (e.g. “hmm, if I wanted to maximise QALYs, what jobs would I want to take that I’m not currently considering?” or “hmm, if I wanted to maximise QALYs, what systems in the world would I be interested in affecting, and what instrumental goals would I want to pursue?”). I don’t see how to do this without, if not a single metric, then a small number of metrics.
Metrics can crystallise tradeoffs. If I’m regularly thinking about different interventions that affect the lives of different farmed animals, then after making several decisions, it’s probably computationally easier for me to come up with a rule for how I tend to trade off cow effects vs sheep effects, and/or freedom effects vs pain reduction effects, then to make that tradeoff every time independently.
Metrics help with legibility. This is less important in the case of an individual choosing career options to take, but suppose that I want to be GiveWell, and recommend charities I think are high-value, or I want to let other people who I don’t know very well invest in my career. In that case, it’s useful to have a legible metric that explains what decisions I’m making, so that other people can predict my future actions better, and so that they can clearly see reasons for why they should support me.
I don’t mean to make a fully general argument against ever using metrics here. I don’t think that using a metric is always a bad thing, especially in some of the more scope-limited examples you give. I do think that some of the examples you give make some unreasonable assumptions, though—let’s go case by case.
I think most of the work here is figuring out which attributes you care about, but I agree that in many cases the best way to make comparisons on those attributes will be via explicitly quantified metrics.
I think that a lot of worries about this kind of “bias” are only coherent in fiduciary situations; Hanson’s examples are all about such situations, where we’re trying to constrain the behavior of people to whom we’ve delegated trust. If you’re deciding on your own account, then it can often be much better to resolve the conflict, or at least clarify what it’s about, rather than trying to behave as though you’re more accountable to others than you are.
This seems completely unobjectionable—simple metrics (I’d use something much simpler than QALYs) can be fantastic for hypothesis generation, and it’s a bad sign if someone claiming to care about scope doesn’t seem to have done this.
This sounds great in principle, but my sense is that in practice almost no one actually does this based on a utilitarian metric because it would be ruinously computationally expensive. The only legible unified impact metrics GiveWell tracks are money moved and website traffic, for instance. Sometimes people pretend to use such metrics as targets to determine their actions, but the benefits of pretending to do a thing are quite different from the benefits of doing the thing.