Why is an AI suddenly able to figure out how to break the laws of physics and be super intelligent about how to end intelligent life, but somehow incapable of comprehending the human laws of ethics and morality, and valuing life as we know it?
Why do you think an AI would need to break the laws of physics in order to become superintelligent? As Eliezer and gwern have pointed out, the laws of physics are no bar to a machine achieving power beyond our capability to stop.
What makes it so much more difficult to understand critical thinking and “how to store human values in a computer”, and in contrast what makes “accidentally ending all intelligent life” so easy, by comparison?
“Accidentally ending all intelligent life” is the default outcome. It’s what happens when you program a self-optimizing maximizing process and unleash it. As Eliezer said, once, “The AI does not hate you. The AI does not fear you. The AI merely sees that you are composed of atoms that it could use for its own purposes.”
Furthermore, why do you think the comprehension is the problem? A superintelligence may fully comprehend human values, but it might be programmed in a way where it just doesn’t care. A superintelligent AI tasked with maximizing the number of paperclips in the universe will of course be capable of comprehending human morality and ethics. It might even say that it agrees. But its utility function is fixed. Its goal is to maximize paperclips. It will do whatever it can to maximize the number of paperclips and if that happens to go against what it knows of human morality, well, so much the worse for human morality, then.
How hard can it be to train AI on a dataset of 90% of known human values and 90% of known problems and solutions with respect to those values for a neural net to have an “above average human”-grasp on the idea that “ending all intelligent life” computes as “that’s a problem and it’s immoral” ?
I look forward to you producing such a database.
Beyond that, alignment is unsolvable anyways for AGI systems that perform at above human intelligence. Can’t predict the future with a software, because there could always be software that uses the future predicting software and negates the output—aka the Halting Problem. Can’t do anything about that.
That is a misunderstanding of the Halting Problem.
Why do you think an AI would need to break the laws of physics in order to become superintelligent? As Eliezer and gwern have pointed out, the laws of physics are no bar to a machine achieving power beyond our capability to stop.
“Accidentally ending all intelligent life” is the default outcome. It’s what happens when you program a self-optimizing maximizing process and unleash it. As Eliezer said, once, “The AI does not hate you. The AI does not fear you. The AI merely sees that you are composed of atoms that it could use for its own purposes.”
Furthermore, why do you think the comprehension is the problem? A superintelligence may fully comprehend human values, but it might be programmed in a way where it just doesn’t care. A superintelligent AI tasked with maximizing the number of paperclips in the universe will of course be capable of comprehending human morality and ethics. It might even say that it agrees. But its utility function is fixed. Its goal is to maximize paperclips. It will do whatever it can to maximize the number of paperclips and if that happens to go against what it knows of human morality, well, so much the worse for human morality, then.
I look forward to you producing such a database.
That is a misunderstanding of the Halting Problem.