What is the largest possible target we could have for aligned AGI?
That is, instead of creating a great and prosperous future, is it possible that we can find an easier path to align an AGI by aiming for the entire set of ‘this-is-fine’ kind of futures?
For example, a future where all new computers are rendered inoperable by malicious software. Or a future where a mostly-inactive AI does nothing except prevent any superintelligence from forming, or that continuously tries to use up all over the available compute in the world.
I don’t believe there is a solution here yet either, but could relaxing the problem from ‘what we actually want’ to ‘anything we could live with’ help? Has there been much work in this direction? Please let me know what to search for if so. Thank you.
Minimum Viable Alignment
What is the largest possible target we could have for aligned AGI?
That is, instead of creating a great and prosperous future, is it possible that we can find an easier path to align an AGI by aiming for the entire set of ‘this-is-fine’ kind of futures?
For example, a future where all new computers are rendered inoperable by malicious software. Or a future where a mostly-inactive AI does nothing except prevent any superintelligence from forming, or that continuously tries to use up all over the available compute in the world.
I don’t believe there is a solution here yet either, but could relaxing the problem from ‘what we actually want’ to ‘anything we could live with’ help? Has there been much work in this direction? Please let me know what to search for if so. Thank you.