I guess the crux here is that I don’t think HLMI need be so narrowly focused on mathematically specified goals. I think there is a kind of intelligence that doesn’t need goals to operate, and also a kind of intelligence that can actually understand that there is a “goal behind every goal” (aka theory of mind). A non-goal directed AI might be driven by creativity or a desire to learn. It might be like GPT-3 where it’s just trying to predict some complex input, but not in a goal-y way. You can obviously frame these things as goals, like in DeepMind’s paper “Reward is Enough”, but I do think there is a sense that intelligence doesn’t absolutely need to be goal-y. Yes, this gets into the oracle vs task vs agent AI debate.
Alignment arguments seem to imagine an “asocial AI” which can’t understand humans but can perform marvelously at narrowly focused mathematically specified goals. For example, “design a nanotech pathogen to kill every human on the planet” but not “convince every human on the planet to believe in my favored political ideology.” Such an AI would certainly be dangerous. The existence proof that intelligence exists, the human brain, also gives us strong reason to believe that an AI could be built which is not asocial and also not sociopathic. The big American tech companies are most interested in this kind of highly social AI, since they are soft powers and need charasmatic AI which people will ultimately like and not reject.
In my opinion the asocial AI is not “real intelligence” and that’s what average people and tech company CEOs will say about it. A “real AI” can have a conversation with a person, and if the person says “your goal is bad, here’s why, please do this instead,” the AI would switch goals. Perhaps that can be framed as a meta-goal such as “do what humans say (or mean)”, but I expect (with some probability) that by the time we get to “real AI” we’ll have some out-of-the-box thinking and have a much better idea of how to make corrigible AI.
In this framework, if we can make highly social AI before we get extremely powerful asocial AI, then we are safe. But if extremely powerful AI comes before we teach AI to be social, then there is big danger of an alignment problem.
I guess the crux here is that I don’t think HLMI need be so narrowly focused on mathematically specified goals. I think there is a kind of intelligence that doesn’t need goals to operate, and also a kind of intelligence that can actually understand that there is a “goal behind every goal” (aka theory of mind). A non-goal directed AI might be driven by creativity or a desire to learn. It might be like GPT-3 where it’s just trying to predict some complex input, but not in a goal-y way. You can obviously frame these things as goals, like in DeepMind’s paper “Reward is Enough”, but I do think there is a sense that intelligence doesn’t absolutely need to be goal-y. Yes, this gets into the oracle vs task vs agent AI debate.
Alignment arguments seem to imagine an “asocial AI” which can’t understand humans but can perform marvelously at narrowly focused mathematically specified goals. For example, “design a nanotech pathogen to kill every human on the planet” but not “convince every human on the planet to believe in my favored political ideology.” Such an AI would certainly be dangerous. The existence proof that intelligence exists, the human brain, also gives us strong reason to believe that an AI could be built which is not asocial and also not sociopathic. The big American tech companies are most interested in this kind of highly social AI, since they are soft powers and need charasmatic AI which people will ultimately like and not reject.
In my opinion the asocial AI is not “real intelligence” and that’s what average people and tech company CEOs will say about it. A “real AI” can have a conversation with a person, and if the person says “your goal is bad, here’s why, please do this instead,” the AI would switch goals. Perhaps that can be framed as a meta-goal such as “do what humans say (or mean)”, but I expect (with some probability) that by the time we get to “real AI” we’ll have some out-of-the-box thinking and have a much better idea of how to make corrigible AI.
In this framework, if we can make highly social AI before we get extremely powerful asocial AI, then we are safe. But if extremely powerful AI comes before we teach AI to be social, then there is big danger of an alignment problem.