The reason why we can easily make AIs which solve chess without destroying the world is because we can make specialized AIs such that they can only operate in the theoretical environment of states of chess boards, and in that environment we can tell it exactly what it’s goal should be.
If we tell an AGI to generate plans for winning at chess, and it knows about the outside world, then because the state space is astronomically larger, it is astronomically more difficult to tell it what it’s goal should be, and so any goal we do give it either satisfies corrigibility, and we can tell it “do what I want”, or incompletely captures what we mean by ‘win this chess game’.
For cancer, there may well be a way to solve the problem using a specialized AI, which works in an environment space simple enough that we can completely specify our goal. I assume though that we are using a general AI in all the hypothetical versions of the problem, which has the property ‘it’s working in an environment space large enough that we can’t specify what we want it to do’ or if it doesn’t know a priori it’s plans can affect the state of a far larger environment space which can affect the environment space it cares about, it may deduce this, and figure out a way to exploit this feature.
I might be conflating Richard, Paul, and my own guesses here. But, I think part of the argument here is about what can happen before AGI, that gives us lines of hope to pursue.
Like, my-model-of-Paul wants various tools for amplifying his own thought to (among other things) help think about solving the longterm alignment problem. And the question is whether there are ways of doing that that actually help when trying to solve the sorts of problems Paul wants to solve. We’ve successfully augmented human arithmetic and chess. Are there tools we actually wish we had, that narrow AI meaningfully helps with,
I’m not sure if Richard has a particular strategy in mind, but I assume he’s exploring the broader question of “what useful things can we build that will help navigate x-risk”
The original dialogs were exploring the concept of pivotal acts that could change humanity’s strategic position. Are there AIs that can execute pivotal acts that are more like calculators and Deep Blue than like autonomous moon-base-builders? (I don’t know if Richard actually shares the pivotal act / acute risk period frame, or was just accepting it for sake of argument)
The problem is not with whether we call the AI AGI or not, it’s whether we can either 1) fully specify our goals in the environment space it’s able to model (or otherwise not care too deeply about the environment space it’s able to model), or 2) verify the effects of the actions it says to do have no disastrous consequences.
To determine whether a tool AI can be used to solve problems Paul wants to solve, or execute pivotal acts, we need a to both 1) determine that the environment is small enough for us to accurately express our goal, and 2) ensure the AI is unable to infer the existence of a broader environment.
(meta note: I’m making a lot of very confident statements, and very few are of the form “<statement>, unless <other statement>, then <statement> may not be true”. This means I am almost certainly overconfident, and my model is incomplete, but I’m making the claims anyway so that they can be developed)
This is what I came here to say! I think you point out a crisp reason why some task settings make alignment harder than others, and why we get catastrophically optimized against by some kinds of smart agents but not others (like Deep Blue).
The reason why we can easily make AIs which solve chess without destroying the world is because we can make specialized AIs such that they can only operate in the theoretical environment of states of chess boards, and in that environment we can tell it exactly what it’s goal should be.
If we tell an AGI to generate plans for winning at chess, and it knows about the outside world, then because the state space is astronomically larger, it is astronomically more difficult to tell it what it’s goal should be, and so any goal we do give it either satisfies corrigibility, and we can tell it “do what I want”, or incompletely captures what we mean by ‘win this chess game’.
For cancer, there may well be a way to solve the problem using a specialized AI, which works in an environment space simple enough that we can completely specify our goal. I assume though that we are using a general AI in all the hypothetical versions of the problem, which has the property ‘it’s working in an environment space large enough that we can’t specify what we want it to do’ or if it doesn’t know a priori it’s plans can affect the state of a far larger environment space which can affect the environment space it cares about, it may deduce this, and figure out a way to exploit this feature.
I might be conflating Richard, Paul, and my own guesses here. But, I think part of the argument here is about what can happen before AGI, that gives us lines of hope to pursue.
Like, my-model-of-Paul wants various tools for amplifying his own thought to (among other things) help think about solving the longterm alignment problem. And the question is whether there are ways of doing that that actually help when trying to solve the sorts of problems Paul wants to solve. We’ve successfully augmented human arithmetic and chess. Are there tools we actually wish we had, that narrow AI meaningfully helps with,
I’m not sure if Richard has a particular strategy in mind, but I assume he’s exploring the broader question of “what useful things can we build that will help navigate x-risk”
The original dialogs were exploring the concept of pivotal acts that could change humanity’s strategic position. Are there AIs that can execute pivotal acts that are more like calculators and Deep Blue than like autonomous moon-base-builders? (I don’t know if Richard actually shares the pivotal act / acute risk period frame, or was just accepting it for sake of argument)
The problem is not with whether we call the AI AGI or not, it’s whether we can either 1) fully specify our goals in the environment space it’s able to model (or otherwise not care too deeply about the environment space it’s able to model), or 2) verify the effects of the actions it says to do have no disastrous consequences.
To determine whether a tool AI can be used to solve problems Paul wants to solve, or execute pivotal acts, we need a to both 1) determine that the environment is small enough for us to accurately express our goal, and 2) ensure the AI is unable to infer the existence of a broader environment.
(meta note: I’m making a lot of very confident statements, and very few are of the form “<statement>, unless <other statement>, then <statement> may not be true”. This means I am almost certainly overconfident, and my model is incomplete, but I’m making the claims anyway so that they can be developed)
This is what I came here to say! I think you point out a crisp reason why some task settings make alignment harder than others, and why we get catastrophically optimized against by some kinds of smart agents but not others (like Deep Blue).