There’s nothing to choose until you’ve specified a decision problem.
It is a known feature of the environment that people are regularly ambushed by an agent, calling itself Omega, which has never yet been known to lie and has access to godlike power.
An agent with godlike power can manufacture evidence that it has any other traits it wants, so observing such evidence isn’t actually evidence that it has those traits.
There’s nothing to choose until you’ve specified a decision problem.
Um, I have.
Imagine, for simplicity, a purely selfish agent. Call it Alice. Alice is an expected utility maximizer, and she gains utility from eating cakes. Omega appears and offers her a deal—they will flip a fair coin, and give Alice three cakes if it comes up heads. If it comes up tails, they will take one cake away her stockpile. Alice runs the numbers, determines that the expected utility is positive, and accepts the deal. Just another day in the life of a perfectly truthful superintelligence offering inexplicable choices.
The next day, Omega returns. This time, they offer a slightly different deal—instead of flipping a coin, they will perfectly simulate Alice once. This copy will live out her life just as she would have done in reality—except that she will be given three cakes. The original Alice, however, receives nothing.
What do you think Alice should choose?
An agent with godlike power can manufacture evidence that it has any other traits it wants, so observing such evidence isn’t actually evidence that it has those traits.
An agent with godlike power can manufacture evidence of anything. This seems suspiciously like a Fully General Counterargument. Such an agent could, in any case, directly hack your brain so you don’t realize it can falsify evidence.
Oh, you were referring to the original decision problem. I was referring to the part of the post I was originally responding to about what subjective odds Alice should assign things. I just ran across an LW post making an important comment on these types of problems, which is that the answer depends on how altruistic Alice feels towards copies of herself. If she feels perfectly altruistic towards copies of herself, then sure, take the cakes.
An agent with godlike power can manufacture evidence of anything. This seems suspiciously like a Fully General Counterargument. Such an agent could, in any case, directly hack your brain so you don’t realize it can falsify evidence.
Yes, it’s a fully general counterargument against believing anything that an agent with godlike power says. Would this sound more reasonable if “godlike” were replaced with “devil-like”?
I just ran across an LW post making an important comment on these types of problems, which is that the answer depends on how altruistic Alice feels towards copies of herself. If she feels perfectly altruistic towards copies of herself, then sure, take the cakes.
That’s … a very good point, actually.
I assume, on this basis, that you agree with the hypothetical at the end?
Would this sound more reasonable if “godlike” were replaced with “devil-like”?
Yes. Devil-like implies known hostility.
By this logic, the moment you turn on a provably Friendly AI you should destroy it, because it might have hacked you into thinking it’s friendly. Worse still, a hostile god would presumably realize you wont believe it, and so hack you into thinking it’s not godlike; so anything claiming not to be godlike is lying.
Bottom line: gods are evidence that the world may be a lie. But not strong evidence.
I assume, on this basis, that you agree with the hypothetical at the end?
Nope. I still don’t agree that simulating an agent N times and doing X to one of them is morally equivalent to risking X to them with probability 1/(N+1). For example, if you are not at all altruistic to copies of yourself, then you don’t care about the former situation as long as the copy that X is being done to is not you. On the other hand, if you value fairness among your copies (that is, if you value your copies having similar quality of life) then you care about the former situation more strongly than the latter situation.
By this logic, the moment you turn on a provably Friendly AI you should destroy it, because it might have hacked you into thinking it’s friendly. Worse still, a hostile god would presumably realize you wont believe it, and so hack you into thinking it’s not godlike; so anything claiming not to be godlike is lying.
Pretty much the only thing a godlike agent can convince me of is that it’s godlike (and I am not even totally convinced this is possible). After that, again, whatever evidence a godlike agent presents of anything else could have been fabricated. Your last inference doesn’t follow from the others; my priors regarding the prevalence of godlike agents is currently extremely low, and claiming not to be godlike is not strong evidence either way.
To be clear, since humans are specified as valuing all agents (including sims of themselves and others) shouldn’t it be equivalent to Alice-who-values-copies-of-herself?
my priors regarding the prevalence of godlike agents is currently extremely low,
And what are those priors based on? Evidence! Evidence that a godlike being would be motivated to falsify!
To be clear, since humans are specified as valuing all agents (including sims of themselves and others) shouldn’t it be equivalent to Alice-who-values-copies-of-herself?
Sure, but the result you describe is equivalent to Alice being an average utilitarian with respect to copies of herself. What if Alice is a total utilitarian with respect to copies of herself?
There’s nothing to choose until you’ve specified a decision problem.
An agent with godlike power can manufacture evidence that it has any other traits it wants, so observing such evidence isn’t actually evidence that it has those traits.
Um, I have.
What do you think Alice should choose?
An agent with godlike power can manufacture evidence of anything. This seems suspiciously like a Fully General Counterargument. Such an agent could, in any case, directly hack your brain so you don’t realize it can falsify evidence.
Oh, you were referring to the original decision problem. I was referring to the part of the post I was originally responding to about what subjective odds Alice should assign things. I just ran across an LW post making an important comment on these types of problems, which is that the answer depends on how altruistic Alice feels towards copies of herself. If she feels perfectly altruistic towards copies of herself, then sure, take the cakes.
Yes, it’s a fully general counterargument against believing anything that an agent with godlike power says. Would this sound more reasonable if “godlike” were replaced with “devil-like”?
That’s … a very good point, actually.
I assume, on this basis, that you agree with the hypothetical at the end?
Yes. Devil-like implies known hostility.
By this logic, the moment you turn on a provably Friendly AI you should destroy it, because it might have hacked you into thinking it’s friendly. Worse still, a hostile god would presumably realize you wont believe it, and so hack you into thinking it’s not godlike; so anything claiming not to be godlike is lying.
Bottom line: gods are evidence that the world may be a lie. But not strong evidence.
Nope. I still don’t agree that simulating an agent N times and doing X to one of them is morally equivalent to risking X to them with probability 1/(N+1). For example, if you are not at all altruistic to copies of yourself, then you don’t care about the former situation as long as the copy that X is being done to is not you. On the other hand, if you value fairness among your copies (that is, if you value your copies having similar quality of life) then you care about the former situation more strongly than the latter situation.
Pretty much the only thing a godlike agent can convince me of is that it’s godlike (and I am not even totally convinced this is possible). After that, again, whatever evidence a godlike agent presents of anything else could have been fabricated. Your last inference doesn’t follow from the others; my priors regarding the prevalence of godlike agents is currently extremely low, and claiming not to be godlike is not strong evidence either way.
To be clear, since humans are specified as valuing all agents (including sims of themselves and others) shouldn’t it be equivalent to Alice-who-values-copies-of-herself?
And what are those priors based on? Evidence! Evidence that a godlike being would be motivated to falsify!
Sure, but the result you describe is equivalent to Alice being an average utilitarian with respect to copies of herself. What if Alice is a total utilitarian with respect to copies of herself?
Actually, she should still make the same choice, although she would choose differently in other scenarios.