This was helpful to me, thanks. I agree this seems almost certainly to be the end state if AI systems are optimizing hard for simple, measurable objectives.
I’m still confused about what happens if AI systems are optimizing moderately for more complicated, measurable objectives (which better capture what humans actually want). Do you think the argument you made implies that we still eventually end up with a universe tiled with molecular smiley faces in this scenario?
I think that this depends on how hard the AI’s are optimising, and how complicated the objectives are. I think that sufficiently moderate optimization for goals sufficiently close to human values will probably end up well.
I also think that optimisation is likely to end up at the physical limits, unless we know how to program an AI that doesn’t want to improve itself, and everyone makes AI’s like that.
Sufficiently moderate AI is just dumb, which is safe. An AI smart enough to stop people producing more AI, yet dumb enough to be safe seems harder.
There is also a question of what “better capturing what humans want” means. A utility function, that when restricted to the space of worlds roughly similar to this one, produces utilities close to the true human utility function, seems easy enough. Suppose we have defined something close to human well being. That definition is in terms of the level of various neurotransmitters near human DNA. Lets suppose this definition would be highly accurate over all history, and would make the right decision over nearly all current political issues. It could still fail completely in a future containing uploaded minds, and neurochemical vats.
Either your approximate utility function needs to be pretty close on all possible futures (even adversarially chosen ones) or you need to know that the AI won’t guide the future towards places that the utility functions differ.
This was helpful to me, thanks. I agree this seems almost certainly to be the end state if AI systems are optimizing hard for simple, measurable objectives.
I’m still confused about what happens if AI systems are optimizing moderately for more complicated, measurable objectives (which better capture what humans actually want). Do you think the argument you made implies that we still eventually end up with a universe tiled with molecular smiley faces in this scenario?
I think that this depends on how hard the AI’s are optimising, and how complicated the objectives are. I think that sufficiently moderate optimization for goals sufficiently close to human values will probably end up well.
I also think that optimisation is likely to end up at the physical limits, unless we know how to program an AI that doesn’t want to improve itself, and everyone makes AI’s like that.
Sufficiently moderate AI is just dumb, which is safe. An AI smart enough to stop people producing more AI, yet dumb enough to be safe seems harder.
There is also a question of what “better capturing what humans want” means. A utility function, that when restricted to the space of worlds roughly similar to this one, produces utilities close to the true human utility function, seems easy enough. Suppose we have defined something close to human well being. That definition is in terms of the level of various neurotransmitters near human DNA. Lets suppose this definition would be highly accurate over all history, and would make the right decision over nearly all current political issues. It could still fail completely in a future containing uploaded minds, and neurochemical vats.
Either your approximate utility function needs to be pretty close on all possible futures (even adversarially chosen ones) or you need to know that the AI won’t guide the future towards places that the utility functions differ.