Diamond: Ahh. I note that looking at the equivalent diamond section, ‘advise Fred to ask for box B instead’ (hopefully including the explanation of one’s knowledge of the presence of the desired diamond) is a notably potentially-helpful action, compared to the other listed options which can be variably undesirable.
Varying priorities: That I change over time is an accepted aspect of existence. There is uncertainty, granted; on the one hand I don’t want to make decisions that a later self would be unable to reverse and might disapprove of, but on the other hand I am willing to sacrifice the happiness of a hypothetical future self for the happiness of my current self (and different hypothetical future selves)… hm, I should read more before I write more, as otherwise redundancy is likely. (Given that my priorities could shift in various ways, one might argue that I would prefer something to act on what I currently definitely want, rather than on what I might or might not want in the future (yet definitely do not want (/want not to be done) /now/). An issue of possible oppression of the existing for the sake of the non-existant… hm.)
To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity? Put another way, for a chatterbot to accurately create the exact interactions/responses that a sapient entity would, is it theoretically necessary for a sapient entity to effectively exist, simulated by the non-sapient entity, or could such an entity mimic a sapient entity withou sapience entering into the matter? (Would then a mimicked-sapient entity exist in a meaningful sense, but only if there were sapient entities hearing its words and benefiting from its willed actions, compared to if there were only multple mimicked-entities talking to each other? Hrm.)
|
If a smarter-me was necessarily simulated in a certain sense in order to carry out its will, I might be willing to accede to it in the same spirit as to extremely-intelligent aliens/robots wanting to wipe out humanity for their own reasons, but I would be unwilling to accept things which are against my interests being carried out for the interests of an entity which does not in fact in any sense exist.
Manifestation: It occurs to me that a sandbox version could be interesting to oberve, one’s non-extrapolated volition wanting our extrapolated volitions to be modelled in simulated world-section level 2, and as a result of such a contradiction instead the extrapolated volitions of those in level 2 /not/ being modelled in level 3, yet still being modelled in level 2… again, though, while such a tool might be extremely useful for second-guessing one’s decisions and discussing with one very, very good reasons to rethink them (and thus in fact oneself changing hopefully-beneficially as a person (?) where applicable), something which directly defies one’s will(/one’s curiosity) lacks appeal as a goal (/stepping stone) to work towards.
To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Both. I meant, in order for the AI not to (very probably) paperclip us.
Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit!
Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?”
See the difference?
Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity?
Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case.
Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.
CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions.
Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples:
Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests.
Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by.
I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which I personally would probably want to carry out, but which I certainly wouldn’t want to be carried out without anyone pulling the strings in fact benefiting from what was being done.
...hmm, those can be summed up as ‘The smarter-me wouldn’t aid my selfishness!’ and ‘The smarter-me would act selfishly in ways which don’t benefit anyone since it isn’t sapient!’. There might admittedly be a lot of non-selfishness carried out, but that seems like a quite large variation from the ideal behaviour desired by the client-equivalent.
I can understand the throwing-out of the individual selfishness for something based on a group and created for the sake of humanity in general, but the taking of selfish actions for a (possibly congomerate) watcher who does not in fact exist (in terms of what is seen) seems as though it remains to be addressed.
...I also find myself wondering whether a smarter-me would want to have arrays built to make itself even smarter, and backup computers for redundancy created in various places each able to simulate its full sapience if necessary, resulting in the creation of hardware running a sapient smarter-me even though the decision-making smarter-me who decided to do so wasn’t in fact sapient/{in existance}… though, arguably, that also wouldn’t be too bad in terms of absolute results… hmm.
Diamond: Ahh. I note that looking at the equivalent diamond section, ‘advise Fred to ask for box B instead’ (hopefully including the explanation of one’s knowledge of the presence of the desired diamond) is a notably potentially-helpful action, compared to the other listed options which can be variably undesirable.
Varying priorities: That I change over time is an accepted aspect of existence. There is uncertainty, granted; on the one hand I don’t want to make decisions that a later self would be unable to reverse and might disapprove of, but on the other hand I am willing to sacrifice the happiness of a hypothetical future self for the happiness of my current self (and different hypothetical future selves)… hm, I should read more before I write more, as otherwise redundancy is likely. (Given that my priorities could shift in various ways, one might argue that I would prefer something to act on what I currently definitely want, rather than on what I might or might not want in the future (yet definitely do not want (/want not to be done) /now/). An issue of possible oppression of the existing for the sake of the non-existant… hm.)
To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity? Put another way, for a chatterbot to accurately create the exact interactions/responses that a sapient entity would, is it theoretically necessary for a sapient entity to effectively exist, simulated by the non-sapient entity, or could such an entity mimic a sapient entity withou sapience entering into the matter? (Would then a mimicked-sapient entity exist in a meaningful sense, but only if there were sapient entities hearing its words and benefiting from its willed actions, compared to if there were only multple mimicked-entities talking to each other? Hrm.) | If a smarter-me was necessarily simulated in a certain sense in order to carry out its will, I might be willing to accede to it in the same spirit as to extremely-intelligent aliens/robots wanting to wipe out humanity for their own reasons, but I would be unwilling to accept things which are against my interests being carried out for the interests of an entity which does not in fact in any sense exist.
Manifestation: It occurs to me that a sandbox version could be interesting to oberve, one’s non-extrapolated volition wanting our extrapolated volitions to be modelled in simulated world-section level 2, and as a result of such a contradiction instead the extrapolated volitions of those in level 2 /not/ being modelled in level 3, yet still being modelled in level 2… again, though, while such a tool might be extremely useful for second-guessing one’s decisions and discussing with one very, very good reasons to rethink them (and thus in fact oneself changing hopefully-beneficially as a person (?) where applicable), something which directly defies one’s will(/one’s curiosity) lacks appeal as a goal (/stepping stone) to work towards.
Both. I meant, in order for the AI not to (very probably) paperclip us.
Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit!
Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?”
See the difference?
Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case.
Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.
CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions.
Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples:
Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests.
Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by.
I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which I personally would probably want to carry out, but which I certainly wouldn’t want to be carried out without anyone pulling the strings in fact benefiting from what was being done.
...hmm, those can be summed up as ‘The smarter-me wouldn’t aid my selfishness!’ and ‘The smarter-me would act selfishly in ways which don’t benefit anyone since it isn’t sapient!’. There might admittedly be a lot of non-selfishness carried out, but that seems like a quite large variation from the ideal behaviour desired by the client-equivalent. I can understand the throwing-out of the individual selfishness for something based on a group and created for the sake of humanity in general, but the taking of selfish actions for a (possibly congomerate) watcher who does not in fact exist (in terms of what is seen) seems as though it remains to be addressed.
...I also find myself wondering whether a smarter-me would want to have arrays built to make itself even smarter, and backup computers for redundancy created in various places each able to simulate its full sapience if necessary, resulting in the creation of hardware running a sapient smarter-me even though the decision-making smarter-me who decided to do so wasn’t in fact sapient/{in existance}… though, arguably, that also wouldn’t be too bad in terms of absolute results… hmm.