Minimum Viable Alignment
What is the largest possible target we could have for aligned AGI?
That is, instead of creating a great and prosperous future, is it possible that we can find an easier path to align an AGI by aiming for the entire set of ‘this-is-fine’ kind of futures?
For example, a future where all new computers are rendered inoperable by malicious software. Or a future where a mostly-inactive AI does nothing except prevent any superintelligence from forming, or that continuously tries to use up all over the available compute in the world.
I don’t believe there is a solution here yet either, but could relaxing the problem from ‘what we actually want’ to ‘anything we could live with’ help? Has there been much work in this direction? Please let me know what to search for if so. Thank you.
Yes, some people are interested in it and other people think it’s not worth it. See e.g. the Eliezer Yudkowsky + Richard Ngo chat log posts.
Will check them out, thank you.
I wrote a post that is related—“Is some kind of minimally-invasive mass surveillance required for catastrophic risk prevention?”
Thanks Chris, but I think you linked to the wrong thing there, I can’t see your post in the last 3 years of your history either!
Sorry, fixed.
Well it depends on your priors for how an AGI would act, but as I understand it, all AGIs will be powerseeking. If an AGI is powerseeking, and has access to some amount of compute, then it will probably bootstrap itself to superintelligence, and then start pushing its utility function all over. Different utility functions cause different results, but even relatively mundane ones like “prevent another superintelligence from being created” could result in the AGI killing all humans and taking over the galaxy to make sure no other superintelligence gets made. I think it’s actually really really hard to specify the what-we-actually-want future for an AGI, so much so that evolutionarily training an AGI in an Earth-like environment so it develops human-ish morals will be necessary.
Aye, I agree it is not a solution to avoiding power seeking, only that there may be a slightly easier target to hit if we can relax as many constraints on alignment as possible.