Chantiel comments on Chantiel’s Shortform

Chantiel 30 Aug 2021 20:22 UTC
1 point
I’ve read this paper on low-impact AIs. There’s something about it that I’m confused and skeptical about.

One of the main methods it proposes works as follows. Find a probability distribution of many possible variables in the world. Let X represent the statement “The AI was turned on”. For each the variables v it considers, the probability distribution over v should, after conditioning on X should, look about the same as the probability distribution over v after conditioning on not-X. That’s low impact.

But the paper doesn’t mention conditioning on any evidence other than X. But, a priori, the probability of the specific AI even existing in the first place is possibly quite low. So simply conditioning on X has the potentially to change your probability distribution over variables of the world, simply because it lets you know that the AI exists.

You could try to get around this by, when calculating a probability distribution of a variable v, also update on the other evidence E the AI has. But if you do this, then I don’t think there would be much difference in P(v|EX) and P(v|E not-X). This is because if the AI can update on the rest of its evidence, it can just infer the current state of the world. For example, if the AI clearly sees the world has been converted to paperclips, I think it would still think the world would be mostly paperclip even on conditioning on “I was never turned on”. Maybe the AI would imagine some other AI did it.

I’m interested in seeing what others think about this.