In order to compute what actions will have the best consequences, you need to have accurate beliefs—otherwise, how do you know what the best consequences are?
There’s a sense in which the theory of “Use our methods of epistemic rationality to build predictively accurate models, then use the models to decide what actions will have the best consequences” is going to be meaningfully simpler than the theory of “Just do whatever has the best consequences, including the consequences of the thinking that you do in order to compute this.”
The original timeless decision theory manuscript distinguishes a class of “decision-determined problems”, where the payoff depends on the agent’s decision, but not the algorithm that the agent uses to arrive at that decision: Omega isn’t allowed to punish you for not making decisions according to the algorithm “Choose the option that comes first alphabetically.” This seems like a useful class of problems to be able to focus on? Having to take into account the side-effects of using a particular categorization, seems like a form of being punished for using a particular algorithm.
I concede that, ultimately, the simple “Cartesian” theory that disregards the consequences of thinking can’t be the true, complete theory of intelligence, because ultimately, the map is part of the territory. I think the embedded agency people are working on this?—I’m afraid I’m not up-to-date on the details. But when I object to people making appeals to consequences, the thing I’m objecting to is never people trying to do a sophisticated embedded-agency thing; I’m objecting to people trying to get away with choosing to be biased.
you think that most people are too irrational to correctly weigh these kinds of considerations against each on a case by case basis, and there’s no way to train them to be more rational about this. Is that true
Actually, yes.
and if so why do you think that?
Long story. How about some game theory instead?
Consider some agents cooperating in a shared epistemic project—drawing a map, or defining a language, or programming an AI—some system that will perform better if it does a better job of corresponding with (some relevant aspects of) reality. Every agent has the opportunity to make the shared map less accurate in exchange for some selfish consequence. But if all of the agents do that, then the shared map will be full of lies. Appeals to consequences tend to diverge (because everyone has her own idiosyncratic favored consequence); “just make the map be accurate” is a natural focal point (because the truth is generally useful to everyone).
I think the embedded agency people are working on this?—I’m afraid I’m not up-to-date on the details. But when I object to people making appeals to consequences, the thing I’m objecting to is never people trying to do a sophisticated embedded-agency thing; I’m objecting to people trying to get away with choosing to be biased.
In that case, maybe you can clarify (in this or future posts) that you’re not against doing sophisticated embedded-agency things? Also, can you give some examples of what you’re objecting to, so I can judge for myself whether they’re actually doing sophisticated embedded-agency things?
Appeals to consequences tend to diverge (because everyone has her own idiosyncratic favored consequence); “just make the map be accurate” is a natural focal point (because the truth is generally useful to everyone).
This just means that in most cases, appeals to consequences won’t move others much, even if they took such consequences into consideration. It doesn’t seem to be a reason for people to refuse to consider such appeals at all. If appeals to consequences only tend to diverge, it seems a good idea to actually consider such appeals, so that in the rare cases where people’s interests converge, they can be moved by such appeals.
So, I have to say that I still don’t understand why you’re taking the position that you are. If you have a longer version of the “story” that you can tell, please consider doing that.
(Thanks for the questioning!—and your patience.)
In order to compute what actions will have the best consequences, you need to have accurate beliefs—otherwise, how do you know what the best consequences are?
There’s a sense in which the theory of “Use our methods of epistemic rationality to build predictively accurate models, then use the models to decide what actions will have the best consequences” is going to be meaningfully simpler than the theory of “Just do whatever has the best consequences, including the consequences of the thinking that you do in order to compute this.”
The original timeless decision theory manuscript distinguishes a class of “decision-determined problems”, where the payoff depends on the agent’s decision, but not the algorithm that the agent uses to arrive at that decision: Omega isn’t allowed to punish you for not making decisions according to the algorithm “Choose the option that comes first alphabetically.” This seems like a useful class of problems to be able to focus on? Having to take into account the side-effects of using a particular categorization, seems like a form of being punished for using a particular algorithm.
I concede that, ultimately, the simple “Cartesian” theory that disregards the consequences of thinking can’t be the true, complete theory of intelligence, because ultimately, the map is part of the territory. I think the embedded agency people are working on this?—I’m afraid I’m not up-to-date on the details. But when I object to people making appeals to consequences, the thing I’m objecting to is never people trying to do a sophisticated embedded-agency thing; I’m objecting to people trying to get away with choosing to be biased.
Actually, yes.
Long story. How about some game theory instead?
Consider some agents cooperating in a shared epistemic project—drawing a map, or defining a language, or programming an AI—some system that will perform better if it does a better job of corresponding with (some relevant aspects of) reality. Every agent has the opportunity to make the shared map less accurate in exchange for some selfish consequence. But if all of the agents do that, then the shared map will be full of lies. Appeals to consequences tend to diverge (because everyone has her own idiosyncratic favored consequence); “just make the map be accurate” is a natural focal point (because the truth is generally useful to everyone).
In that case, maybe you can clarify (in this or future posts) that you’re not against doing sophisticated embedded-agency things? Also, can you give some examples of what you’re objecting to, so I can judge for myself whether they’re actually doing sophisticated embedded-agency things?
This just means that in most cases, appeals to consequences won’t move others much, even if they took such consequences into consideration. It doesn’t seem to be a reason for people to refuse to consider such appeals at all. If appeals to consequences only tend to diverge, it seems a good idea to actually consider such appeals, so that in the rare cases where people’s interests converge, they can be moved by such appeals.
So, I have to say that I still don’t understand why you’re taking the position that you are. If you have a longer version of the “story” that you can tell, please consider doing that.
I will endeavor to make my intuitions more rigorous and write up the results in a future post.