wedrifid comments on Terminal Bias

wedrifid 1 Feb 2012 2:15 UTC
1 point

As for wireheading, well, I guess I am questioning the feasibility of objective paperclippers. If you stick one in a perfect, indistinguishable-in-principle experience machine, what then are its paperclip preferences about?

Yes, that’s one place where thinking about AIs is a bit more complex than we’re used to. After all us humans seem to handling things simply—we take our input rather literally and just act. If we are creating an intelligent agent such as a paperclip maximiser however we need to both program it both to find itself within the universal wavefunction and tell it which part of the wavefunction to create paperclips in.

It seems like the natural thing to create when creating an ‘objective’ paperclipper is one which maximises physical paperclips in the universe. This means that the clipper must make a probability estimate with regard to how likely it is to be in a simulation relative to how likely it is to be in the objective reality and then trade off it’s prospects for influence. If it thinks it is in the universe without being simulated it’ll merrily take over and manufacture. If it predicts that it is in a simulated ‘experience machine’ it may behave in whatever way it thinks will influence the creators to be most likely to allow a paperclip maximiser (itself or another—doesn’t care) to escape.

To approach it from a different angle, if we live in many worlds, can we specify which world our preferences are about? It seems likely to me that the answer is yes, but in the absence of an answer to that question, I’m still pretty uncertain.

I would say yes to this one with perhaps less uncertainty—I have probably thought about the question somewhat more while writing a post and more attention should usually reduce uncertainty. We have a universal wavefunction, we chose a part of it that approximately represents an Everett branch and program it to “Care Here”. After all if we think about preferences with respect to Many Worlds and maintain preferences that “add up to normal” then this is basically what we are doing ourselves already.