Hmmm… Upon thinking it over in my spare brain-cycles for a few hours, I’d say the most likely failure mode of an attempted FAI is to extrapolate from the wrong valuation machinery in humans. For instance, you could end up with a world full of things people want and like, but don’t approve. You would thus end up having a lot of fun while simultaneously knowing that everything about it is all wrong and it’s never, ever going to stop.
Of course, that’s just one cell in a 2^3-cell grid, and that’s assuming Yvain’s model of human motivations is accurate enough that FAI designers actually tried to use it, and then hit a very wrong square out of 8 possible squares.
Within that model, I’d say “approving” is what we’re calling the motivational system that imposes moral limits on our behavior, so I would say if you manage to combine wanting and/or liking with a definite +approving, you’ve got a solid shot at something people would consider moral. Ideally, I’d say Friendliness should shoot for +liking/+approving while letting wanting vary. That is, an AI should do things people both like and approve of without regard to whether those people would actually feel motivated enough to do them.
You would thus end up having a lot of fun while simultaneously knowing that everything about it is all wrong and it’s never, ever going to stop.
Are we totally sure this is not what utopia initially feels like from the inside? Because I have to say, that sentence sounded kinda attractive for a second.
What kinds of wierdtopias are you imagining that would fulfill those criteria?
Because the ones that first sprung to mind for me (this might make an interesting exercise for people, actually) were all emphatically, well, wrong. Bad. Unethical. Evil… could you give some examples?
I of course don’t speak for EY, but what I would mean if I made a similar comment would hinge on expecting my experience of “I know that everything about this is all wrong” to correlate with anything that’s radically different from what I was expecting and am accustomed to, whether or not they are bad, unethical, or evil, and even if I would endorse it (on sufficient reflection) more than any alternatives.
Given that I expect my ideal utopia to be radically different from what I was expecting and am accustomed to (because, really, how likely is the opposite?), I should therefore expect to react that way to it initially.
Although I don’t usually include a description of the various models of the other speaker I’m juggling during conversation, that’s my current best guess. However, principle of charity and so forth.
(Plus Eliezer is very good at coming up with wierdtopias—probably better than I am.)
It’s what an ill-designed “utopia” might feel like. Note the link to Yvain’s posting: I’m referring to a “utopia” that basically consists of enforced heroin usage, or its equivalent. Surely you can come up with better things to do than that in five minutes’ thinking.
Perhaps it had implications that only became clear to a superintelligence?
Hmmm… Upon thinking it over in my spare brain-cycles for a few hours, I’d say the most likely failure mode of an attempted FAI is to extrapolate from the wrong valuation machinery in humans. For instance, you could end up with a world full of things people want and like, but don’t approve. You would thus end up having a lot of fun while simultaneously knowing that everything about it is all wrong and it’s never, ever going to stop.
Of course, that’s just one cell in a 2^3-cell grid, and that’s assuming Yvain’s model of human motivations is accurate enough that FAI designers actually tried to use it, and then hit a very wrong square out of 8 possible squares.
Within that model, I’d say “approving” is what we’re calling the motivational system that imposes moral limits on our behavior, so I would say if you manage to combine wanting and/or liking with a definite +approving, you’ve got a solid shot at something people would consider moral. Ideally, I’d say Friendliness should shoot for +liking/+approving while letting wanting vary. That is, an AI should do things people both like and approve of without regard to whether those people would actually feel motivated enough to do them.
Are we totally sure this is not what utopia initially feels like from the inside? Because I have to say, that sentence sounded kinda attractive for a second.
What kinds of wierdtopias are you imagining that would fulfill those criteria?
Because the ones that first sprung to mind for me (this might make an interesting exercise for people, actually) were all emphatically, well, wrong. Bad. Unethical. Evil… could you give some examples?
I of course don’t speak for EY, but what I would mean if I made a similar comment would hinge on expecting my experience of “I know that everything about this is all wrong” to correlate with anything that’s radically different from what I was expecting and am accustomed to, whether or not they are bad, unethical, or evil, and even if I would endorse it (on sufficient reflection) more than any alternatives.
Given that I expect my ideal utopia to be radically different from what I was expecting and am accustomed to (because, really, how likely is the opposite?), I should therefore expect to react that way to it initially.
Although I don’t usually include a description of the various models of the other speaker I’m juggling during conversation, that’s my current best guess. However, principle of charity and so forth.
(Plus Eliezer is very good at coming up with wierdtopias—probably better than I am.)
It’s what an ill-designed “utopia” might feel like. Note the link to Yvain’s posting: I’m referring to a “utopia” that basically consists of enforced heroin usage, or its equivalent. Surely you can come up with better things to do than that in five minutes’ thinking.