And you didn’t even pick the hardest part. One can argue that if we knew more, thought faster, were more the people we wished we were we would be free(er) of cognitive biases like risk aversion (writing that into your utility function seems simply incorrect); and anyone who isn’t in favour of wireheading is likely to want to preserve some form of love.
Rather, think of justice. I’m inclined to argue that justice-for-the-sake-of-justice is a lost purpose, but I’d expect vigorous disagreement even after “extrapolating”/”improving” people a fair bit.
Why is risk aversion a bias, but love is not? We know that risk aversion is strictly dominated for rational agents, but I think it likely that love is strictly dominated by some clever game-theoretic approach to mating. Why oppose wireheading, for that matter? Like eliminating risk aversion, it’s a more efficient way for us to get what we want.
Have you seen the new conclusion to the OP? Risk aversion has value to us, but it is a ‘bias’ because is sabotages the achievement of other values. Love has much value to us, but it does not systematically sabotage our other values, so it is a ‘value’.
The labels ‘bias’ and ‘value’ are fuzzy and quantitative.
Love does sabotage my other values. I’ve made career sacrifices for it (which I don’t regret).
Given complexity of value, most valuable things require tradeoffs. The difference between a bias and a value may be quantitative, but unless I know how to calculate it that doesn’t help very much.
Let’s try to avoid sticking a “SOLVED” label on this problem before we’ve properly understood it.
Love does sabotage my other values. I’ve made career sacrifices for it (which I don’t regret).
Right. You do have to sacrifice some resources (time, mental energy, etc) that could be used for other things. All values do that, but some of them sabotage you more than just a reasonable resource cost.
Given complexity of value, most valuable things require tradeoffs. The difference between a bias and a value may be quantitative, but unless I know how to calculate it that doesn’t help very much.
Sure it does. It helps us not be stupid about belief to know that belief is quantitative, likewise with many other things. It gives us the right mental model for thinking about it, even if we can’t do actual calculations. That’s why everyone is always talking about utility functions, even tho no one actually has access to one. Knowing that bias vs value is a spectrum helps us not get too concerned about placing them in hard categories.
Let’s try to avoid sticking a “SOLVED” label on this problem before we’ve properly understood it.
This is a good point. “Solved” is a very serious label.
Right. You do have to sacrifice some resources (time, mental energy, etc) that could be used for other things. All values do that, but some one them sabotage you more than just a reasonable resource cost.
I’m having to guess about the meaning of the second sentence but if I guessed right then I agree that the mode of decision making used by many people when ‘love’ comes into it drastically differs from a mode vaguely representing utility maximising—and often not in a healthy way!
Love often comes packed with some shitty thinking, but it doesn’t seem to lose its value if we think rationally about it.
I wasn’t referring to love as a value that has more than a straightforward resource cost, I was referring to stuff like risk aversion, hindsight bias, anger and such that damage your ability to allocate resources, as opposed to just costing resources.
I wasn’t referring to love as a value that has more than a straightforward resource cost, I was referring to stuff like risk aversion, hindsight bias, anger and such that damage your ability to allocate resources, as opposed to just costing resources.
I take it you’re aware of Eliezer’s ideas about the complexity of (human) value? I’d summarize his ideas as “neither I nor an extrapolated Eliezer want to be wireheaded” (which is an observation, not an argument.)
I’d also point you at Yvain’s Are wireheads happy? (also consider his wanting/liking/approving classification).) These posts are a very strong argument against classical “make them want it” wireheading (which wireheads want, but do not actually enjoy all that much or approve of at all). Of course, in the least convenient possible world we’d have more sophisticated wireheading, such that the wireheads do like and approve of wireheading. Of course, this does not mean that non-wireheads desire or approve of wireheading, which brings us back to Eliezer’s point.
Do you feel LW’s thought on this subject are incomplete?
I currently do not wish to be wireheaded, but I’m uncertain whether that aversion will turn out to be coherent. A technique for preventing AIs from wireheading has not yet been discovered, and there is no proof that one exists. So in that sense LW’s thoughts on the subject are necessarily incomplete.
I also don’t feel that the least convenient wireheading machine has really been dealt with, that I can recall. Say a machine can really, genuinely convince me that I’m getting all the things that I want and I’m not aware I’m being deceived, so much so that I can’t distinguish the real world and the machine, even in principle. I don’t know what it would mean for me to say that I didn’t prefer that world. What are my preferences about, at that point?
(I agree that the the thinking is incomplete but disagree on the detail regarding in which sense the thinking is incomplete.)
I currently do not wish to be wireheaded, but I’m uncertain whether that aversion will turn out to be coherent. A technique for preventing AIs from wireheading has not yet been discovered, and there is no proof that one exists. So in that sense LW’s thoughts on the subject are necessarily incomplete.
While proving things formally about something as complicated as AI is hard it would be misleading to act as if this makes the question “is it possible to have AIs that don’t wirehead” at all an open question. The probability that such an AI is impossible is sufficiently small as to be safely ignored. Objective functions which wirehead are fairly obviously just a subset of objective functions which optimize towards any arbitrary state of the universe. That paperclip maximisers could theoretically exist is not something that is an open question on LW and so therefore it would be rather incoherent for LW to have much doubt whether something in a far broader class that includes paperclippers could exist.
Where the thoughts are actually more complicated and in doubt is in regards to what humans in general or ourselves in particular actually want. Introspection is far too uncertain in that regard!
While I agree completely with this comment, I’m upvoting almost entirely because of this sentence:
Thanks, it felt like just leaving the refutation there was more argumentative when I intended more discussion/elaboration so I tried to preemptively de-escalate and emphasize that we mostly agreed.
I agree completely that human value is particularly hard to ascertain.
As for wireheading, well, I guess I am questioning the feasibility of objective paperclippers. If you stick one in a perfect, indistinguishable-in-principle experience machine, what then are its paperclip preferences about?
To approach it from a different angle, if we live in many worlds, can we specify which world our preferences are about? It seems likely to me that the answer is yes, but in the absence of an answer to that question, I’m still pretty uncertain.
As for wireheading, well, I guess I am questioning the feasibility of objective paperclippers. If you stick one in a perfect, indistinguishable-in-principle experience machine, what then are its paperclip preferences about?
Yes, that’s one place where thinking about AIs is a bit more complex than we’re used to. After all us humans seem to handling things simply—we take our input rather literally and just act. If we are creating an intelligent agent such as a paperclip maximiser however we need to both program it both to find itself within the universal wavefunction and tell it which part of the wavefunction to create paperclips in.
It seems like the natural thing to create when creating an ‘objective’ paperclipper is one which maximises physical paperclips in the universe. This means that the clipper must make a probability estimate with regard to how likely it is to be in a simulation relative to how likely it is to be in the objective reality and then trade off it’s prospects for influence. If it thinks it is in the universe without being simulated it’ll merrily take over and manufacture. If it predicts that it is in a simulated ‘experience machine’ it may behave in whatever way it thinks will influence the creators to be most likely to allow a paperclip maximiser (itself or another—doesn’t care) to escape.
To approach it from a different angle, if we live in many worlds, can we specify which world our preferences are about? It seems likely to me that the answer is yes, but in the absence of an answer to that question, I’m still pretty uncertain.
I would say yes to this one with perhaps less uncertainty—I have probably thought about the question somewhat more while writing a post and more attention should usually reduce uncertainty. We have a universal wavefunction, we chose a part of it that approximately represents an Everett branch and program it to “Care Here”. After all if we think about preferences with respect to Many Worlds and maintain preferences that “add up to normal” then this is basically what we are doing ourselves already.
And you didn’t even pick the hardest part. One can argue that if we knew more, thought faster, were more the people we wished we were we would be free(er) of cognitive biases like risk aversion (writing that into your utility function seems simply incorrect); and anyone who isn’t in favour of wireheading is likely to want to preserve some form of love.
Rather, think of justice. I’m inclined to argue that justice-for-the-sake-of-justice is a lost purpose, but I’d expect vigorous disagreement even after “extrapolating”/”improving” people a fair bit.
Why is risk aversion a bias, but love is not? We know that risk aversion is strictly dominated for rational agents, but I think it likely that love is strictly dominated by some clever game-theoretic approach to mating. Why oppose wireheading, for that matter? Like eliminating risk aversion, it’s a more efficient way for us to get what we want.
I am still confused.
Have you seen the new conclusion to the OP? Risk aversion has value to us, but it is a ‘bias’ because is sabotages the achievement of other values. Love has much value to us, but it does not systematically sabotage our other values, so it is a ‘value’.
The labels ‘bias’ and ‘value’ are fuzzy and quantitative.
Love does sabotage my other values. I’ve made career sacrifices for it (which I don’t regret).
Given complexity of value, most valuable things require tradeoffs. The difference between a bias and a value may be quantitative, but unless I know how to calculate it that doesn’t help very much.
Let’s try to avoid sticking a “SOLVED” label on this problem before we’ve properly understood it.
Right. You do have to sacrifice some resources (time, mental energy, etc) that could be used for other things. All values do that, but some of them sabotage you more than just a reasonable resource cost.
Sure it does. It helps us not be stupid about belief to know that belief is quantitative, likewise with many other things. It gives us the right mental model for thinking about it, even if we can’t do actual calculations. That’s why everyone is always talking about utility functions, even tho no one actually has access to one. Knowing that bias vs value is a spectrum helps us not get too concerned about placing them in hard categories.
This is a good point. “Solved” is a very serious label.
I’m having to guess about the meaning of the second sentence but if I guessed right then I agree that the mode of decision making used by many people when ‘love’ comes into it drastically differs from a mode vaguely representing utility maximising—and often not in a healthy way!
Sorry, I had a wrong word in that sentence.
Love often comes packed with some shitty thinking, but it doesn’t seem to lose its value if we think rationally about it.
I wasn’t referring to love as a value that has more than a straightforward resource cost, I was referring to stuff like risk aversion, hindsight bias, anger and such that damage your ability to allocate resources, as opposed to just costing resources.
That’s how I took it and I agree.
I take it you’re aware of Eliezer’s ideas about the complexity of (human) value? I’d summarize his ideas as “neither I nor an extrapolated Eliezer want to be wireheaded” (which is an observation, not an argument.)
I’d also point you at Yvain’s Are wireheads happy? (also consider his wanting/liking/approving classification).) These posts are a very strong argument against classical “make them want it” wireheading (which wireheads want, but do not actually enjoy all that much or approve of at all). Of course, in the least convenient possible world we’d have more sophisticated wireheading, such that the wireheads do like and approve of wireheading. Of course, this does not mean that non-wireheads desire or approve of wireheading, which brings us back to Eliezer’s point.
Do you feel LW’s thought on this subject are incomplete?
I currently do not wish to be wireheaded, but I’m uncertain whether that aversion will turn out to be coherent. A technique for preventing AIs from wireheading has not yet been discovered, and there is no proof that one exists. So in that sense LW’s thoughts on the subject are necessarily incomplete.
I also don’t feel that the least convenient wireheading machine has really been dealt with, that I can recall. Say a machine can really, genuinely convince me that I’m getting all the things that I want and I’m not aware I’m being deceived, so much so that I can’t distinguish the real world and the machine, even in principle. I don’t know what it would mean for me to say that I didn’t prefer that world. What are my preferences about, at that point?
(I agree that the the thinking is incomplete but disagree on the detail regarding in which sense the thinking is incomplete.)
While proving things formally about something as complicated as AI is hard it would be misleading to act as if this makes the question “is it possible to have AIs that don’t wirehead” at all an open question. The probability that such an AI is impossible is sufficiently small as to be safely ignored. Objective functions which wirehead are fairly obviously just a subset of objective functions which optimize towards any arbitrary state of the universe. That paperclip maximisers could theoretically exist is not something that is an open question on LW and so therefore it would be rather incoherent for LW to have much doubt whether something in a far broader class that includes paperclippers could exist.
Where the thoughts are actually more complicated and in doubt is in regards to what humans in general or ourselves in particular actually want. Introspection is far too uncertain in that regard!
While I agree completely with this comment, I’m upvoting almost entirely because of this sentence:
It sort of encapsulates what attracted me to this site in the first place.
Thanks, it felt like just leaving the refutation there was more argumentative when I intended more discussion/elaboration so I tried to preemptively de-escalate and emphasize that we mostly agreed.
I agree completely that human value is particularly hard to ascertain.
As for wireheading, well, I guess I am questioning the feasibility of objective paperclippers. If you stick one in a perfect, indistinguishable-in-principle experience machine, what then are its paperclip preferences about?
To approach it from a different angle, if we live in many worlds, can we specify which world our preferences are about? It seems likely to me that the answer is yes, but in the absence of an answer to that question, I’m still pretty uncertain.
Yes, that’s one place where thinking about AIs is a bit more complex than we’re used to. After all us humans seem to handling things simply—we take our input rather literally and just act. If we are creating an intelligent agent such as a paperclip maximiser however we need to both program it both to find itself within the universal wavefunction and tell it which part of the wavefunction to create paperclips in.
It seems like the natural thing to create when creating an ‘objective’ paperclipper is one which maximises physical paperclips in the universe. This means that the clipper must make a probability estimate with regard to how likely it is to be in a simulation relative to how likely it is to be in the objective reality and then trade off it’s prospects for influence. If it thinks it is in the universe without being simulated it’ll merrily take over and manufacture. If it predicts that it is in a simulated ‘experience machine’ it may behave in whatever way it thinks will influence the creators to be most likely to allow a paperclip maximiser (itself or another—doesn’t care) to escape.
I would say yes to this one with perhaps less uncertainty—I have probably thought about the question somewhat more while writing a post and more attention should usually reduce uncertainty. We have a universal wavefunction, we chose a part of it that approximately represents an Everett branch and program it to “Care Here”. After all if we think about preferences with respect to Many Worlds and maintain preferences that “add up to normal” then this is basically what we are doing ourselves already.
I’m not sure what motivation there might be to call that ‘a wireheading machine’ and not ‘the universe’.
Exactly.
Sure? :-)