it doesn’t imply that agent’s will kill themselves when you tell them they were going to, it implies that they can if you telling them is last scrap of bayesian evidence necessary to move the agent to act in that way. EY’s point is that agents have to figure out what maximizes utility, not predict what they will do because the self-reference causes problems.
E.g., we don’t want a calculator that outputs “whatever I output for 2+2” we want a calculator to output the answer to 2+2. The former is true no matter what the calculator outputs, the latter has a single answer. Similarly, there is only one action which maximizes utility (or at least a subset of all possible actions). But if an agent takes the action that it predicts it will take, it’s predictions are true by definition, so any action suffices.
I think real agents act as though they believe they have free will.
That means that they rate their own decisions to act as determining their actions, and advice from others about how they think they are going to act as being attempts to manipulate them. Another agent encouraging you to behave in a particular way isn’t usually evidence you should update on, it’s a manipulation attempt—and agents are smart enough to know the difference.
Are there circumstances under which you should update on such evidence? Yes, if the agent is judged to be both knowledgeable and trustworthy—but that is equally true if you employ practically any sensible decision process.
Re: if an agent takes the action that it predicts it will take, it’s predictions are true by definition, so any action suffices.
Agents do take the actions they predict they will take—it seems like a matter of fact to me. However, that’s not the criteria they use as the basis for making their predictions in the first place. I didn’t ever claim it was—such a claim would be very silly.
Agents do take the actions they predict they will take—it seems like a matter of fact to me. However, that’s not the criteria they use as the basis for making their predictions in the first place. I didn’t ever claim it was—such a claim would be very silly.
Indeed. You originally wrote:
The agent doesn’t know what action it is going to take. If it did, it would just take the action—not spend time calculating the consequences of its various possible actions.
You language is somewhat vague here, which is why EY clarified.
it doesn’t imply that agent’s will kill themselves when you tell them they were going to, it implies that they can if you telling them is last scrap of bayesian evidence necessary to move the agent to act in that way. EY’s point is that agents have to figure out what maximizes utility, not predict what they will do because the self-reference causes problems.
E.g., we don’t want a calculator that outputs “whatever I output for 2+2” we want a calculator to output the answer to 2+2. The former is true no matter what the calculator outputs, the latter has a single answer. Similarly, there is only one action which maximizes utility (or at least a subset of all possible actions). But if an agent takes the action that it predicts it will take, it’s predictions are true by definition, so any action suffices.
I think real agents act as though they believe they have free will.
That means that they rate their own decisions to act as determining their actions, and advice from others about how they think they are going to act as being attempts to manipulate them. Another agent encouraging you to behave in a particular way isn’t usually evidence you should update on, it’s a manipulation attempt—and agents are smart enough to know the difference.
Are there circumstances under which you should update on such evidence? Yes, if the agent is judged to be both knowledgeable and trustworthy—but that is equally true if you employ practically any sensible decision process.
Re: if an agent takes the action that it predicts it will take, it’s predictions are true by definition, so any action suffices.
Agents do take the actions they predict they will take—it seems like a matter of fact to me. However, that’s not the criteria they use as the basis for making their predictions in the first place. I didn’t ever claim it was—such a claim would be very silly.
Indeed. You originally wrote:
You language is somewhat vague here, which is why EY clarified.