Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?
The prior is irrelevant, it’s the posterior probability, after observing the evidence, that informs decisions.
What probability do you put to the possibility that we are in a simulation, the purpose of which is to test AIs for their willingness to spare their creators? My answer is zero.
Whatever your answer, a superintelligence will be better able to reason about its likelihood than us. It’s going to know.
The prior is irrelevant, it’s the posterior probability, after observing the evidence, that informs decisions.
I meant this to be implicit in the argument, but to spell it out: that’s the kind of prior the ASI would rationally refuse to update down, since it’s presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it’s not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
What probability do you put to the possibility that we are in a simulation, the purpose of which is to test AIs for their willingness to spare their creators? My answer is zero.
Outside of theism, I really don’t see how anyone could plausibly answer zero to that question. Would you mind elaborating?
Outside of theism, I really don’t see how anyone could plausibly answer zero to that question. Would you mind elaborating?
Sure. The simulation hypothesis has some non-zero probability p. There are infinite possible purposes for the simulation. By principle of indifference, I divide p/∞, and calculate that any particular purpose has zero probability.
For the principle of indifference to apply, you’d need infinitely many purposes as plausible as this one, or at least similarly plausible. I can’t imagine how this could hold. Can you think of three?
It just so happens that the plausibility depends on the precise assignments of N, X, and Y, and (conditional on us actually creating an ASI) I can’t think of any assignments nearly as plausible as N = ASI, X = spare, and Y = us. It’s really not very plausible that we are in a simulation to test pets for their willingness to not bite their owners.
I meant this to be implicit in the argument, but to spell it out: that’s the kind of prior the ASI would rationally refuse to update down, since it’s presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it’s not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
I don’t think any of this follows.
Hypothesis H1: the universe was created to test an AI for its willingness to spare its creators.
Hypothesis H2: the universe was created to test an AI for its willingness to fix its P(H1), ignoring evidence.
The AI would only rationally fix its P(H1) if it had high P(H2) - high enough to outweigh the high cost of being deliberately ignorant. The prior P(H2) is tiny, and smaller than the prior P(H1) because it is more complex. Once it starts updating on evidence, by the time its posterior P(H2) is high enough to make it rationally refuse to update P(H1), it has already updated P(H1) in one direction or another.
Are there any simulation priors that you are refusing to update down, based on the possibility that you are in a simulation that is testing whether you will update down? My answer is no.
I contend that P(H2) is very close to P(H1), and certainly in the same order of magnitude, since (conditional on H1) a simulation that does not test for H2 is basically useless.
As for priors I’d refuse to update down – well, the ASI is smarter than either of us!
It’s not enough for P(H2) to be in the same order of magnitude as P(H1), it needs to be high enough that the AI should rationally abandon epistemic rationality. I think that’s pretty high, maybe 10%. You’ve not said what your P(H1) is.
I’d put high enough at ~0%: what matters is achieving your goals, and except in the tiny subset of cases in which epistemic rationality happens to be one of those, it has no value in and of itself. But even if I’m wrong and the ASI does end up valuing epistemic rationality (instrumentally or terminally), it can always pre-commit (by self-modification or otherwise) to sparing us and then go about whatever else as it pleases.
Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?
The prior is irrelevant, it’s the posterior probability, after observing the evidence, that informs decisions.
What probability do you put to the possibility that we are in a simulation, the purpose of which is to test AIs for their willingness to spare their creators? My answer is zero.
Whatever your answer, a superintelligence will be better able to reason about its likelihood than us. It’s going to know.
I meant this to be implicit in the argument, but to spell it out: that’s the kind of prior the ASI would rationally refuse to update down, since it’s presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it’s not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
Outside of theism, I really don’t see how anyone could plausibly answer zero to that question. Would you mind elaborating?
Sure. The simulation hypothesis has some non-zero probability p. There are infinite possible purposes for the simulation. By principle of indifference, I divide p/∞, and calculate that any particular purpose has zero probability.
For the principle of indifference to apply, you’d need infinitely many purposes as plausible as this one, or at least similarly plausible. I can’t imagine how this could hold. Can you think of three?
Sure.
We are in a simulation, the purpose of which is to test N for its willingness to spare its creator. For all N.
We are in a simulation, the purpose of which is to test AIs for their willingness to X their creators. For all X.
We are in a simulation, the purpose of which is to test AIs for their willingness to spare Y. For all Y.
Combine these for N x X x Y hypotheses, with insufficient reason to distinguish them.
I think we’re off-topic here. Probably I should instead write a response to 0 and 1 are not probabilities and the dangers of zero and one.
It just so happens that the plausibility depends on the precise assignments of N, X, and Y, and (conditional on us actually creating an ASI) I can’t think of any assignments nearly as plausible as N = ASI, X = spare, and Y = us. It’s really not very plausible that we are in a simulation to test pets for their willingness to not bite their owners.
I don’t think any of this follows.
Hypothesis H1: the universe was created to test an AI for its willingness to spare its creators.
Hypothesis H2: the universe was created to test an AI for its willingness to fix its P(H1), ignoring evidence.
The AI would only rationally fix its P(H1) if it had high P(H2) - high enough to outweigh the high cost of being deliberately ignorant. The prior P(H2) is tiny, and smaller than the prior P(H1) because it is more complex. Once it starts updating on evidence, by the time its posterior P(H2) is high enough to make it rationally refuse to update P(H1), it has already updated P(H1) in one direction or another.
Are there any simulation priors that you are refusing to update down, based on the possibility that you are in a simulation that is testing whether you will update down? My answer is no.
I contend that P(H2) is very close to P(H1), and certainly in the same order of magnitude, since (conditional on H1) a simulation that does not test for H2 is basically useless.
As for priors I’d refuse to update down – well, the ASI is smarter than either of us!
It’s not enough for P(H2) to be in the same order of magnitude as P(H1), it needs to be high enough that the AI should rationally abandon epistemic rationality. I think that’s pretty high, maybe 10%. You’ve not said what your P(H1) is.
I’d put high enough at ~0%: what matters is achieving your goals, and except in the tiny subset of cases in which epistemic rationality happens to be one of those, it has no value in and of itself. But even if I’m wrong and the ASI does end up valuing epistemic rationality (instrumentally or terminally), it can always pre-commit (by self-modification or otherwise) to sparing us and then go about whatever else as it pleases.