I currently do not wish to be wireheaded, but I’m uncertain whether that aversion will turn out to be coherent. A technique for preventing AIs from wireheading has not yet been discovered, and there is no proof that one exists. So in that sense LW’s thoughts on the subject are necessarily incomplete.
I also don’t feel that the least convenient wireheading machine has really been dealt with, that I can recall. Say a machine can really, genuinely convince me that I’m getting all the things that I want and I’m not aware I’m being deceived, so much so that I can’t distinguish the real world and the machine, even in principle. I don’t know what it would mean for me to say that I didn’t prefer that world. What are my preferences about, at that point?
(I agree that the the thinking is incomplete but disagree on the detail regarding in which sense the thinking is incomplete.)
I currently do not wish to be wireheaded, but I’m uncertain whether that aversion will turn out to be coherent. A technique for preventing AIs from wireheading has not yet been discovered, and there is no proof that one exists. So in that sense LW’s thoughts on the subject are necessarily incomplete.
While proving things formally about something as complicated as AI is hard it would be misleading to act as if this makes the question “is it possible to have AIs that don’t wirehead” at all an open question. The probability that such an AI is impossible is sufficiently small as to be safely ignored. Objective functions which wirehead are fairly obviously just a subset of objective functions which optimize towards any arbitrary state of the universe. That paperclip maximisers could theoretically exist is not something that is an open question on LW and so therefore it would be rather incoherent for LW to have much doubt whether something in a far broader class that includes paperclippers could exist.
Where the thoughts are actually more complicated and in doubt is in regards to what humans in general or ourselves in particular actually want. Introspection is far too uncertain in that regard!
While I agree completely with this comment, I’m upvoting almost entirely because of this sentence:
Thanks, it felt like just leaving the refutation there was more argumentative when I intended more discussion/elaboration so I tried to preemptively de-escalate and emphasize that we mostly agreed.
I agree completely that human value is particularly hard to ascertain.
As for wireheading, well, I guess I am questioning the feasibility of objective paperclippers. If you stick one in a perfect, indistinguishable-in-principle experience machine, what then are its paperclip preferences about?
To approach it from a different angle, if we live in many worlds, can we specify which world our preferences are about? It seems likely to me that the answer is yes, but in the absence of an answer to that question, I’m still pretty uncertain.
As for wireheading, well, I guess I am questioning the feasibility of objective paperclippers. If you stick one in a perfect, indistinguishable-in-principle experience machine, what then are its paperclip preferences about?
Yes, that’s one place where thinking about AIs is a bit more complex than we’re used to. After all us humans seem to handling things simply—we take our input rather literally and just act. If we are creating an intelligent agent such as a paperclip maximiser however we need to both program it both to find itself within the universal wavefunction and tell it which part of the wavefunction to create paperclips in.
It seems like the natural thing to create when creating an ‘objective’ paperclipper is one which maximises physical paperclips in the universe. This means that the clipper must make a probability estimate with regard to how likely it is to be in a simulation relative to how likely it is to be in the objective reality and then trade off it’s prospects for influence. If it thinks it is in the universe without being simulated it’ll merrily take over and manufacture. If it predicts that it is in a simulated ‘experience machine’ it may behave in whatever way it thinks will influence the creators to be most likely to allow a paperclip maximiser (itself or another—doesn’t care) to escape.
To approach it from a different angle, if we live in many worlds, can we specify which world our preferences are about? It seems likely to me that the answer is yes, but in the absence of an answer to that question, I’m still pretty uncertain.
I would say yes to this one with perhaps less uncertainty—I have probably thought about the question somewhat more while writing a post and more attention should usually reduce uncertainty. We have a universal wavefunction, we chose a part of it that approximately represents an Everett branch and program it to “Care Here”. After all if we think about preferences with respect to Many Worlds and maintain preferences that “add up to normal” then this is basically what we are doing ourselves already.
I currently do not wish to be wireheaded, but I’m uncertain whether that aversion will turn out to be coherent. A technique for preventing AIs from wireheading has not yet been discovered, and there is no proof that one exists. So in that sense LW’s thoughts on the subject are necessarily incomplete.
I also don’t feel that the least convenient wireheading machine has really been dealt with, that I can recall. Say a machine can really, genuinely convince me that I’m getting all the things that I want and I’m not aware I’m being deceived, so much so that I can’t distinguish the real world and the machine, even in principle. I don’t know what it would mean for me to say that I didn’t prefer that world. What are my preferences about, at that point?
(I agree that the the thinking is incomplete but disagree on the detail regarding in which sense the thinking is incomplete.)
While proving things formally about something as complicated as AI is hard it would be misleading to act as if this makes the question “is it possible to have AIs that don’t wirehead” at all an open question. The probability that such an AI is impossible is sufficiently small as to be safely ignored. Objective functions which wirehead are fairly obviously just a subset of objective functions which optimize towards any arbitrary state of the universe. That paperclip maximisers could theoretically exist is not something that is an open question on LW and so therefore it would be rather incoherent for LW to have much doubt whether something in a far broader class that includes paperclippers could exist.
Where the thoughts are actually more complicated and in doubt is in regards to what humans in general or ourselves in particular actually want. Introspection is far too uncertain in that regard!
While I agree completely with this comment, I’m upvoting almost entirely because of this sentence:
It sort of encapsulates what attracted me to this site in the first place.
Thanks, it felt like just leaving the refutation there was more argumentative when I intended more discussion/elaboration so I tried to preemptively de-escalate and emphasize that we mostly agreed.
I agree completely that human value is particularly hard to ascertain.
As for wireheading, well, I guess I am questioning the feasibility of objective paperclippers. If you stick one in a perfect, indistinguishable-in-principle experience machine, what then are its paperclip preferences about?
To approach it from a different angle, if we live in many worlds, can we specify which world our preferences are about? It seems likely to me that the answer is yes, but in the absence of an answer to that question, I’m still pretty uncertain.
Yes, that’s one place where thinking about AIs is a bit more complex than we’re used to. After all us humans seem to handling things simply—we take our input rather literally and just act. If we are creating an intelligent agent such as a paperclip maximiser however we need to both program it both to find itself within the universal wavefunction and tell it which part of the wavefunction to create paperclips in.
It seems like the natural thing to create when creating an ‘objective’ paperclipper is one which maximises physical paperclips in the universe. This means that the clipper must make a probability estimate with regard to how likely it is to be in a simulation relative to how likely it is to be in the objective reality and then trade off it’s prospects for influence. If it thinks it is in the universe without being simulated it’ll merrily take over and manufacture. If it predicts that it is in a simulated ‘experience machine’ it may behave in whatever way it thinks will influence the creators to be most likely to allow a paperclip maximiser (itself or another—doesn’t care) to escape.
I would say yes to this one with perhaps less uncertainty—I have probably thought about the question somewhat more while writing a post and more attention should usually reduce uncertainty. We have a universal wavefunction, we chose a part of it that approximately represents an Everett branch and program it to “Care Here”. After all if we think about preferences with respect to Many Worlds and maintain preferences that “add up to normal” then this is basically what we are doing ourselves already.
I’m not sure what motivation there might be to call that ‘a wireheading machine’ and not ‘the universe’.
Exactly.