To check, does ‘in order for it to be safe’ refer to ‘safe from the perspectives of multiple humans’, compared to ‘safe from the perspective of the value-set source/s’? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Both. I meant, in order for the AI not to (very probably) paperclip us.
Another example that comes to mind regarding a conflict of priorities: ‘If your brain was this much more advanced, you would find this particular type of art the most sublime thing you’d ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can’t appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.’
Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit!
Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?”
See the difference?
Digression: If such an entity acts according to a smarter-me’s will, then theoretically existing does the smarter-me necessarily ‘exist’ as simulated/interpreted by the entity?
Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case.
Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.
CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions.
Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples:
Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests.
Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by.
I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which I personally would probably want to carry out, but which I certainly wouldn’t want to be carried out without anyone pulling the strings in fact benefiting from what was being done.
...hmm, those can be summed up as ‘The smarter-me wouldn’t aid my selfishness!’ and ‘The smarter-me would act selfishly in ways which don’t benefit anyone since it isn’t sapient!’. There might admittedly be a lot of non-selfishness carried out, but that seems like a quite large variation from the ideal behaviour desired by the client-equivalent.
I can understand the throwing-out of the individual selfishness for something based on a group and created for the sake of humanity in general, but the taking of selfish actions for a (possibly congomerate) watcher who does not in fact exist (in terms of what is seen) seems as though it remains to be addressed.
...I also find myself wondering whether a smarter-me would want to have arrays built to make itself even smarter, and backup computers for redundancy created in various places each able to simulate its full sapience if necessary, resulting in the creation of hardware running a sapient smarter-me even though the decision-making smarter-me who decided to do so wasn’t in fact sapient/{in existance}… though, arguably, that also wouldn’t be too bad in terms of absolute results… hmm.
Both. I meant, in order for the AI not to (very probably) paperclip us.
Our (or someone else’s) volitions are extrapolated in the initial dynamic. The output of this CEV may recommend that we ourselves are actually transformed in this or that way. However, extrapolating volition does not imply that the output is not for our own benefit!
Speaking in a very loose sense for the sake of clarity: “If you were smarter, looking at the real world from the outside what actions would you want taking in the real world?” is the essential question – and the real world is one in which the humans that exist are not themselves coherently-extrapolated beings. The question is not “If a smarter you existed in the real world, what actions would it want taking in the real world?”
See the difference?
Hopefully the AI’s simulations of people are not sentient! It may be necessary for the AI to reduce the accuracy of its computations, in order to ensure that this is not the case.
Again, Eliezer discusses this in the document on CEV which I would encourage you to read if you are interested in the subject.
CEV document: I have at this point somewhat looked at it, but indeed I should ideally find time to read through it and think through it more thoroughly. I am aware that the sorts of questions I think of have very likely already been thought of by those who have spent many more hours thinking about the subject than I have, and am grateful that the time has been taken to answer ths specific thoughts that come to mind as initial reactions.
Reaction to the difference-showing example (simplified by the assumption that a sapient smarter-me is assumed to not exist in any form), in two examples:
Case 1: I hypothetically want enough money to live in luxury (and achieve various other goals) without effort (and hypothetically lack the mental ability to bring this about easily). Extrapolated, a smarter me looking at this real world from the outside would be a separate entity from me, have nothing in particular to gain from making my life easier in such a way, and so not take actions in my interests.
Case 2: A smarter-me watching the world from outside may hold a significantly different aesthetic sense than the normal me in the world, and may act to rearrange the world in such a way as to be most pleasing to that me watching from outside. This being done, in theory resulting in great satisfaction and pleasure of the watcher, the problem remains that the watcher does not in fact exist to appreciate what has been done, and the only sapient entities involved are the humans which have been meddled with for reasons which they presumably do not understand, are not happy about, and plausibly are not benefited by.
I note that a lot in fact hinges on the hypothetical benevolence of the smarter-me, and the assumption/hope/trust that it would after all not act in particularly negative ways toward the existant humans, but given a certain degree of selfishness one can probably assume a range of hopefully-at-worst-neutral significant actions which I personally would probably want to carry out, but which I certainly wouldn’t want to be carried out without anyone pulling the strings in fact benefiting from what was being done.
...hmm, those can be summed up as ‘The smarter-me wouldn’t aid my selfishness!’ and ‘The smarter-me would act selfishly in ways which don’t benefit anyone since it isn’t sapient!’. There might admittedly be a lot of non-selfishness carried out, but that seems like a quite large variation from the ideal behaviour desired by the client-equivalent. I can understand the throwing-out of the individual selfishness for something based on a group and created for the sake of humanity in general, but the taking of selfish actions for a (possibly congomerate) watcher who does not in fact exist (in terms of what is seen) seems as though it remains to be addressed.
...I also find myself wondering whether a smarter-me would want to have arrays built to make itself even smarter, and backup computers for redundancy created in various places each able to simulate its full sapience if necessary, resulting in the creation of hardware running a sapient smarter-me even though the decision-making smarter-me who decided to do so wasn’t in fact sapient/{in existance}… though, arguably, that also wouldn’t be too bad in terms of absolute results… hmm.