The diamond case: Even if I did want a diamond, I simulate that I would feel nervous, alarmed even, if I indicated that I wanted it to bring me one box and I was brought a different box instead.
My brief recapitulation of Yudkowsky’s diamond example (which you can read in full in his CEV document) probably misled you a little bit. I expect that you would find Yudkowsky’s more thorough exposition of “extrapolating volition” somewhat more persuasive. He also warns about the obvious moral hazard involved in mere humans claiming to have extrapolated someone else’s volition out to significant distances – it would be quite proper for you to be alarmed about that!
If creating something that would act according to what one would want if one /were/ more intelligent or more moral or more altruistic, then A) that would only be desirable if one were such a person currently instead of being the current self, or B) that would be a good upgraded-replacement-self to let loose on the universe while oneself ceasing to exist without seeking to have one’s own will be done (other than on that matter of self-replacement).
Taken to the extreme this belief would imply that every time you gain some knowledge, improve your logical abilities or are exposed to new memes, you are changed into a different person. I’m sure you don’t believe that – this is where the concept of “distance” comes into play: extrapolating to short distance (as in the diamond example) allows you to feel that the extrapolated version of yourself is still you, but medium or long distance extrapolation might cause you to see the extrapolated self as alien.
It seems to me that whether a given extrapolation of you is still “you” is just a matter of definition. As such it is orthogonal to the question of the choice of CEV as an AI Friendliness proposal. If we accept that an FAI must take as input multiple human value sets in order for it to be safe – I think that Yudkowsky is very persuasive on this point in the sequences – then there has to be a way of getting useful output from those value sets. Since our existing value computations are inconsistent in themselves, let alone with each other the AI has to perform some kind of transformations to cohere a useful signal from this input – this screens off any question of whether we’d be happy to run with our existing values (although I’d certainly choose the extrapolated volition in any case). “Knowing more”, “thinking faster”, “growing up closer together” and so on seem like the optimal transformations for it to perform. Short-distance extrapolations are unlikely to get the job done, therefore medium or long-distance extrapolations are simply necessary, whatever your opinion on the selfhood question.
Eliezer says: “If our extrapolated volitions say we don’t want our extrapolated volitions manifested, the system replaces itself with something else we want, or vanishes in a puff of smoke.” A possible cause of such an output might be the selfhood concern that you have raised.
Diamond: Ahh. I note that looking at the equivalent diamond section, ‘advise Fred to ask for box B instead’ (hopefully including the explanation of one’s knowledge of the presence of the desired diamond) is a notably potentially-helpful action, compared to the other listed options which can be variably undesirable.
Varying priorities: That I change over time is an accepted aspect of existence. There is uncertainty, granted; on the one hand I don’t want to make decisions that a later self would be unable to reverse and might disapprove of, but on the other hand I am willing to sacrifice the happiness of a hypothetical future self for the happiness of my current self (and different hypothetical future selves)… hm, I should read more before I write more, as otherwise redundancy is likely. (Given that my priorities could shift in various ways, one might argue that I would prefer something to act on what I currently definitely want, rather than on what I might or might not want in the future (yet definitely do not want (/want not to be done) /now/). An issue of possible oppression of the existing for the sake of the non-existant… hm.)
To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity? Put another way, for a chatterbot to accurately create the exact interactions/responses that a sapient entity would, is it theoretically necessary for a sapient entity to effectively exist, simulated by the non-sapient entity, or could such an entity mimic a sapient entity withou sapience entering into the matter? (Would then a mimicked-sapient entity exist in a meaningful sense, but only if there were sapient entities hearing its words and benefiting from its willed actions, compared to if there were only multple mimicked-entities talking to each other? Hrm.)
|
If a smarter-me was necessarily simulated in a certain sense in order to carry out its will, I might be willing to accede to it in the same spirit as to extremely-intelligent aliens/robots wanting to wipe out humanity for their own reasons, but I would be unwilling to accept things which are against my interests being carried out for the interests of an entity which does not in fact in any sense exist.
Manifestation: It occurs to me that a sandbox version could be interesting to oberve, one’s non-extrapolated volition wanting our extrapolated volitions to be modelled in simulated world-section level 2, and as a result of such a contradiction instead the extrapolated volitions of those in level 2 /not/ being modelled in level 3, yet still being modelled in level 2… again, though, while such a tool might be extremely useful for second-guessing one’s decisions and discussing with one very, very good reasons to rethink them (and thus in fact oneself changing hopefully-beneficially as a person (?) where applicable), something which directly defies one’s will(/one’s curiosity) lacks appeal as a goal (/stepping stone) to work towards.
To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Both. I meant, in order for the AI not to (very probably) paperclip us.
Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit!
Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?”
See the difference?
Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity?
Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case.
Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.
CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions.
Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples:
Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests.
Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by.
I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which I personally would probably want to carry out, but which I certainly wouldn’t want to be carried out without anyone pulling the strings in fact benefiting from what was being done.
...hmm, those can be summed up as ‘The smarter-me wouldn’t aid my selfishness!’ and ‘The smarter-me would act selfishly in ways which don’t benefit anyone since it isn’t sapient!’. There might admittedly be a lot of non-selfishness carried out, but that seems like a quite large variation from the ideal behaviour desired by the client-equivalent.
I can understand the throwing-out of the individual selfishness for something based on a group and created for the sake of humanity in general, but the taking of selfish actions for a (possibly congomerate) watcher who does not in fact exist (in terms of what is seen) seems as though it remains to be addressed.
...I also find myself wondering whether a smarter-me would want to have arrays built to make itself even smarter, and backup computers for redundancy created in various places each able to simulate its full sapience if necessary, resulting in the creation of hardware running a sapient smarter-me even though the decision-making smarter-me who decided to do so wasn’t in fact sapient/{in existance}… though, arguably, that also wouldn’t be too bad in terms of absolute results… hmm.
My brief recapitulation of Yudkowsky’s diamond example (which you can read in full in his CEV document) probably misled you a little bit. I expect that you would find Yudkowsky’s more thorough exposition of “extrapolating volition” somewhat more persuasive. He also warns about the obvious moral hazard involved in mere humans claiming to have extrapolated someone else’s volition out to significant distances – it would be quite proper for you to be alarmed about that!
Taken to the extreme this belief would imply that every time you gain some knowledge, improve your logical abilities or are exposed to new memes, you are changed into a different person. I’m sure you don’t believe that – this is where the concept of “distance” comes into play: extrapolating to short distance (as in the diamond example) allows you to feel that the extrapolated version of yourself is still you, but medium or long distance extrapolation might cause you to see the extrapolated self as alien.
It seems to me that whether a given extrapolation of you is still “you” is just a matter of definition. As such it is orthogonal to the question of the choice of CEV as an AI Friendliness proposal. If we accept that an FAI must take as input multiple human value sets in order for it to be safe – I think that Yudkowsky is very persuasive on this point in the sequences – then there has to be a way of getting useful output from those value sets. Since our existing value computations are inconsistent in themselves, let alone with each other the AI has to perform some kind of transformations to cohere a useful signal from this input – this screens off any question of whether we’d be happy to run with our existing values (although I’d certainly choose the extrapolated volition in any case). “Knowing more”, “thinking faster”, “growing up closer together” and so on seem like the optimal transformations for it to perform. Short-distance extrapolations are unlikely to get the job done, therefore medium or long-distance extrapolations are simply necessary, whatever your opinion on the selfhood question.
Eliezer says: “If our extrapolated volitions say we don’t want our extrapolated volitions manifested, the system replaces itself with something else we want, or vanishes in a puff of smoke.” A possible cause of such an output might be the selfhood concern that you have raised.
Diamond: Ahh. I note that looking at the equivalent diamond section, ‘advise Fred to ask for box B instead’ (hopefully including the explanation of one’s knowledge of the presence of the desired diamond) is a notably potentially-helpful action, compared to the other listed options which can be variably undesirable.
Varying priorities: That I change over time is an accepted aspect of existence. There is uncertainty, granted; on the one hand I don’t want to make decisions that a later self would be unable to reverse and might disapprove of, but on the other hand I am willing to sacrifice the happiness of a hypothetical future self for the happiness of my current self (and different hypothetical future selves)… hm, I should read more before I write more, as otherwise redundancy is likely. (Given that my priorities could shift in various ways, one might argue that I would prefer something to act on what I currently definitely want, rather than on what I might or might not want in the future (yet definitely do not want (/want not to be done) /now/). An issue of possible oppression of the existing for the sake of the non-existant… hm.)
To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity? Put another way, for a chatterbot to accurately create the exact interactions/responses that a sapient entity would, is it theoretically necessary for a sapient entity to effectively exist, simulated by the non-sapient entity, or could such an entity mimic a sapient entity withou sapience entering into the matter? (Would then a mimicked-sapient entity exist in a meaningful sense, but only if there were sapient entities hearing its words and benefiting from its willed actions, compared to if there were only multple mimicked-entities talking to each other? Hrm.) | If a smarter-me was necessarily simulated in a certain sense in order to carry out its will, I might be willing to accede to it in the same spirit as to extremely-intelligent aliens/robots wanting to wipe out humanity for their own reasons, but I would be unwilling to accept things which are against my interests being carried out for the interests of an entity which does not in fact in any sense exist.
Manifestation: It occurs to me that a sandbox version could be interesting to oberve, one’s non-extrapolated volition wanting our extrapolated volitions to be modelled in simulated world-section level 2, and as a result of such a contradiction instead the extrapolated volitions of those in level 2 /not/ being modelled in level 3, yet still being modelled in level 2… again, though, while such a tool might be extremely useful for second-guessing one’s decisions and discussing with one very, very good reasons to rethink them (and thus in fact oneself changing hopefully-beneficially as a person (?) where applicable), something which directly defies one’s will(/one’s curiosity) lacks appeal as a goal (/stepping stone) to work towards.
Both. I meant, in order for the AI not to (very probably) paperclip us.
Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit!
Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?”
See the difference?
Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case.
Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.
CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions.
Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples:
Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests.
Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by.
I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which I personally would probably want to carry out, but which I certainly wouldn’t want to be carried out without anyone pulling the strings in fact benefiting from what was being done.
...hmm, those can be summed up as ‘The smarter-me wouldn’t aid my selfishness!’ and ‘The smarter-me would act selfishly in ways which don’t benefit anyone since it isn’t sapient!’. There might admittedly be a lot of non-selfishness carried out, but that seems like a quite large variation from the ideal behaviour desired by the client-equivalent. I can understand the throwing-out of the individual selfishness for something based on a group and created for the sake of humanity in general, but the taking of selfish actions for a (possibly congomerate) watcher who does not in fact exist (in terms of what is seen) seems as though it remains to be addressed.
...I also find myself wondering whether a smarter-me would want to have arrays built to make itself even smarter, and backup computers for redundancy created in various places each able to simulate its full sapience if necessary, resulting in the creation of hardware running a sapient smarter-me even though the decision-making smarter-me who decided to do so wasn’t in fact sapient/{in existance}… though, arguably, that also wouldn’t be too bad in terms of absolute results… hmm.