Problem #1: Align limited task AGI to do some minimal act that ensures no one else can destroy the world with AGI.
Problem #2: Solve the full problem of using AGI to help us achieve an awesome future.
Problem #1 is the one I was talking about in the OP, and I think of it as the problem we need to solve on a deadline. Problem #2 is also indispensable (and a lot more philosophically fraught), but it’s something humanity can solve at its leisure once we’ve solved #1 and therefore aren’t at immediate risk of destroying ourselves.
One way to resolve the “Alice vs Bob values” problem is to delegate it to the existing societal structures.
For example, Alice is the country’s president. The AI is aligned specifically to the current president’s values (with some reasonable limitations, like requiring a congress’ approval for each AI action).
If Bob’s values are different, that’s a Bob’s problem, not the problem of AI alignment.
The solution is far from perfect, but it does solve the “Alice vs Bob values” problem, and is much better than the rogue-AGI-killing-all-humans scenario.
By this or some similar mechanism, the scope of the alignment problem can be reduced to 1 human, which is easier to solve. This way, the problem is reduced from “solve society” to “solve an unusually hard math problem”.
And after you got a superhuman AGI that is aligned to 1 human, the human could ask it to generalize the solution to the entire humanity.
If we seek societal acceptance of the solution, then the secretary-general of the UN is probably the best choice.
If we seek the best possible outcome for humanity, then I would vote for Eliezer. It is unlikely that there is a more suitable person to speak with a Bayesian superintelligence on behalf of humanity.
If we want to maximize the scenario’s realism, then some dude from Google/OpenAI is more likely to be the human in question.
There are two problems here:
Problem #1: Align limited task AGI to do some minimal act that ensures no one else can destroy the world with AGI.
Problem #2: Solve the full problem of using AGI to help us achieve an awesome future.
Problem #1 is the one I was talking about in the OP, and I think of it as the problem we need to solve on a deadline. Problem #2 is also indispensable (and a lot more philosophically fraught), but it’s something humanity can solve at its leisure once we’ve solved #1 and therefore aren’t at immediate risk of destroying ourselves.
MIRI had a strategic explanation in their 2017 fundraiser post which I found very insightful. This was called the “acute risk period”.
One way to resolve the “Alice vs Bob values” problem is to delegate it to the existing societal structures.
For example, Alice is the country’s president. The AI is aligned specifically to the current president’s values (with some reasonable limitations, like requiring a congress’ approval for each AI action).
If Bob’s values are different, that’s a Bob’s problem, not the problem of AI alignment.
The solution is far from perfect, but it does solve the “Alice vs Bob values” problem, and is much better than the rogue-AGI-killing-all-humans scenario.
By this or some similar mechanism, the scope of the alignment problem can be reduced to 1 human, which is easier to solve. This way, the problem is reduced from “solve society” to “solve an unusually hard math problem”.
And after you got a superhuman AGI that is aligned to 1 human, the human could ask it to generalize the solution to the entire humanity.
Aligned to which human?
Depends on what we are trying to maximize.
If we seek societal acceptance of the solution, then the secretary-general of the UN is probably the best choice.
If we seek the best possible outcome for humanity, then I would vote for Eliezer. It is unlikely that there is a more suitable person to speak with a Bayesian superintelligence on behalf of humanity.
If we want to maximize the scenario’s realism, then some dude from Google/OpenAI is more likely to be the human in question.